Liangwei Chen, Ruibin Huang, Fengyu Cai
During the precidential election 2016, the breaking news has been booming up one after another, which greatly affected the public opinion and the polling result. The fairness of the media has been a controversial topic throughout the election. The republican candidate, Donald Trump, even openly criticized many of the mainstream medias as ‘fake news’ on twitter. It stimulated our interest about the study of the existence of media polarity, especially for the period of presidential election.
FAKE NEWS - A TOTAL POLITICAL WITCH HUNT!
— Donald J. Trump (@realDonaldTrump) January 10, 2017
The main goal of our project is to mine the news before the polling day, November 7th, mainly on the website, to figure out whether public's conventional judgement of media are correct or not, i.e. whether 'fake news' is fake

We collected existing material about the media polarity as our reference. Among them, we firstly got some general judgements from the MediaBiasFactCheck. Some conventional results are listed below:
Since these four medias are so popular, we decided to set these as our targets.
We collected the monthly news data from NOW Corpus (News on the Web) established and maintained by Brigham Young University. We mainly extracted three types of data:
NewsID    Date    Country    Media    NewsURL    Title
                    After observing and analyzing on the datasets above, we mainly preprocessed the data in the following aspects:
As there are thousands of medias on the website, we decided to choose the mainstream medias based on the reference. It has collected top 30 US media. Later, we checked the frequency of these media in our October news dataset, and drop out some medias with fewer publications. We set the frequency threshold as 500 to filter out four unpopular medias.
 Luckily, the four target medias are all in this list.
                
                
After the first glances of thousands pieces of news, we would like to verify our assumption above that the topics about the US presidential election are the most popular ones during the three months before the polling date.
We use the method of TF-IDF to find out top 5 keywords for each of the documents, and analysis their frequency during the three months.
                        Surprisingly from three month plots, we find that in all of the three months, the terminologies in the campaign like election and voter exist frequently.
                        
                        
                        Meanwhile, the hot topics drawing extensive attention such as woman and police are also booming up in our keyword analysis. They reflected special features during the elections: Clinton is the first woman candidate from Republican and Democrartic Parties, and the right of police has been controversary during the campaign.
                        
                        
                        What supported our assumption most is that during all three months, trump is most popular topic! And we check the media polarity of topic trump first.
                    
After the first glance of thousands pieces of news, we would like to verify our assumption above that the topics about the US presidential election are the most popular ones during the two months (September, October, 2018) before the polling date.
As we know, in the United States, the Republican Party is the representation of Right-wing power, and its opponent, the Democraty party, is the symbol of the left-wing or left-middle (WIKI). Therefore, our hypothesis is that the about Trump, the presidential candidate of Republican Party, CNN may show the negative attitude, Politico should stay neural, while Fox news will support for Trump.
Expectedly, in the months of September and October in 2016, the result we got is basically matching with the conventional result.
For the Politico, we can clearly figure out the percentages of its positive and negative are almost the same. And most of their news are neutral.
For the news from CNN, we can find that in both of the months, it shows the major negativeness on Trump.
Furthermore, we find out that in September, the news from Fox mainly supported Trump. But it is abnormal that in October, the support from Fox diminished. After search, we discovered that two events about Trump have disgraded his reputations.
It is reasonable to explain why the positiveness of Fox on Trump has been dropped down.
                    After the media polarity check with topic trump, we would like to expand topics which are related to trump to make our result more convincing.
                    
                    
                    Based on the three calculation methods, which will be later elaborated in the methodology part, we evaluated the relationship between trump and other topics, and display it as a word chart.
                

As we found that the topics selected have been messed up, which involved a lot of Name, City and State, we decided to categorize the topics. The result is following:
 
                         
                         
                        Even though it is part of our result, there are still something interesting that we would like to share that they are quite matching with the reality, and support and verify our method, for example:
After filtering, the wordcloud of filtered keyword is following:
 
                        And from them, based on the consideration of controversiality, we selected Clinton and Republican as our next step study.
In this part, we would like to try to utilize the relationship between trump and the selected topics to further evaluate the reliablilty of those conventional judgments.
The reason why we choose these two topics are that Trump was the representative of Republican Party, and thus there should be strong similarity between the attitudes towards these two entities. Furthermore we also cannot ignore Clinton as she was the opponent of Trump during the election.
Firstly, we would like to dig up the media polarity behind Clinton. As the first woman presidential candidate, her involvement in the campaign has involved tons of attentions.
The following are the media polarity checks about Clinton and Republican among the mainstream media three months before polling date.
In the following, we selected four mainstream media and analyze their media polarity based on the topic Trump, Clinton and Republican.
From the above chart, we could discover the following findings:
We scatter the result with x-axis be positive polarty, and y-axis be negative polarity. After our observation, we find that it also match our hypothesis.
We also attached all the polarity of all medias we studies under the 4-media graph.
By investigating into the polarity of news just before the US election's polling date in 2016, some conventional judgments to the media are verified while some are challenged based on our analysis. Through out the whole process, we first successfully verified that Trump was the hottest topic during that periods. Then we observed the distinctions among media's attitudes towards this 'strange' president candidate. By focusing on the most related two entities with Trump, i.e., Clinton and Republican, and looking into the behaviors of four typical media in America, we eventually found out that public's conventional feeling that Politico tends to be neutral is correct and Fox is a solid supporter to Republican is correct. On the other hand, the so-called 'fake news' CNN as referred by Trump was not biased to some extent.
As mentioned above, media is a forth power in the civilized society. We hope media is able to shoulder the responsiblity of reporting truth and objective.
Last but not least, we would like to brief introduced our methodologies:
I. Keyword Selection: the methods of TF-IDF will give each word in the document a significant weight, and we could select the top words as the keywords of the passage.
II. Sentimental Classifier: We applied the combination result of two advanced sentimental analysis packages in Python, NLTK and TextBlob. Here, we did the validation test based on the manual labels to the best threshold to classify the media attitude.
III. Topic Expansion: from Trump, we would like to expand topic set. We have three methods which could refer to our notebook. Basically, they are simple frequency check, cosine similarity check, and Latent Dirichlet allocation.