Media Polarity

in the presidential election


Applied Data Analysis @EPFL

Liangwei Chen, Ruibin Huang, Fengyu Cai



During the precidential election 2016, the breaking news has been booming up one after another, which greatly affected the public opinion and the polling result. The fairness of the media has been a controversial topic throughout the election. The republican candidate, Donald Trump, even openly criticized many of the mainstream medias as ‘fake news’ on twitter. It stimulated our interest about the study of the existence of media polarity, especially for the period of presidential election.

The main goal of our project is to mine the news before the polling day, November 7th, mainly on the website, to figure out whether public's conventional judgement of media are correct or not, i.e. whether 'fake news' is fake


Literature Review

We collected existing material about the media polarity as our reference. Among them, we firstly got some general judgements from the MediaBiasFactCheck. Some conventional results are listed below:

  1. CNN: left biased
  2. ABC: left-center biased
  3. Politico: Least biased
  4. Fox News: Right biased

Since these four medias are so popular, we decided to set these as our targets.


Data Description and Data Preprocessing


We collected the monthly news data from NOW Corpus (News on the Web) established and maintained by Brigham Young University. We mainly extracted three types of data:

  1. The source file, as the main directory, provided the overall information of the news. And the format of each line is NewsID Date Country Media NewsURL Title
  2. The text file contains the content of news.
  3. The lexicon file contains the linguistic information for each word, which is useful for our further NLP analysis.


After observing and analyzing on the datasets above, we mainly preprocessed the data in the following aspects:

  1. Data Extraction: retrieved the news during the three months before the polling date (August, September and October). According to our assumption, these three months are the competetive period of the election.
  2. Text Cleaning: removed stopwords, punctuations (excluding sentence splitters), and other unnecessary and illegal characters
  3. Media Grouping: for one media, it can be presented in multiple formats, such as 'Fox News', 'fox6', and 'www.fox5sendiego.com'. Therefore, we need to collect and group them together as one media.


Media Selection


As there are thousands of medias on the website, we decided to choose the mainstream medias based on the reference. It has collected top 30 US media. Later, we checked the frequency of these media in our October news dataset, and drop out some medias with fewer publications. We set the frequency threshold as 500 to filter out four unpopular medias.

Luckily, the four target medias are all in this list.


Leading role: Trump!


After the first glances of thousands pieces of news, we would like to verify our assumption above that the topics about the US presidential election are the most popular ones during the three months before the polling date.

We use the method of TF-IDF to find out top 5 keywords for each of the documents, and analysis their frequency during the three months.

Keyword Frequency During Three Months


Surprisingly from three month plots, we find that in all of the three months, the terminologies in the campaign like election and voter exist frequently.

Meanwhile, the hot topics drawing extensive attention such as woman and police are also booming up in our keyword analysis. They reflected special features during the elections: Clinton is the first woman candidate from Republican and Democrartic Parties, and the right of police has been controversary during the campaign.

What supported our assumption most is that during all three months, trump is most popular topic! And we check the media polarity of topic trump first.


Media Polarity about Trump


After the first glance of thousands pieces of news, we would like to verify our assumption above that the topics about the US presidential election are the most popular ones during the two months (September, October, 2018) before the polling date.

As we know, in the United States, the Republican Party is the representation of Right-wing power, and its opponent, the Democraty party, is the symbol of the left-wing or left-middle (WIKI). Therefore, our hypothesis is that the about Trump, the presidential candidate of Republican Party, CNN may show the negative attitude, Politico should stay neural, while Fox news will support for Trump.

Expectedly, in the months of September and October in 2016, the result we got is basically matching with the conventional result.

For the Politico, we can clearly figure out the percentages of its positive and negative are almost the same. And most of their news are neutral.

For the news from CNN, we can find that in both of the months, it shows the major negativeness on Trump.

Furthermore, we find out that in September, the news from Fox mainly supported Trump. But it is abnormal that in October, the support from Fox diminished. After search, we discovered that two events about Trump have disgraded his reputations.

  1. On 1st, October, Trump has been reported to have avoided paying the tax for approximate 18 years.[source]
  2. On 7th, October, his sexual harassment scandal tape has been leaked out by the victim.[source]
It is reasonable to explain why the positiveness of Fox on Trump has been dropped down.


Topic Expansion

After the media polarity check with topic trump, we would like to expand topics which are related to trump to make our result more convincing.

Based on the three calculation methods, which will be later elaborated in the methodology part, we evaluated the relationship between trump and other topics, and display it as a word chart.



As we found that the topics selected have been messed up, which involved a lot of Name, City and State, we decided to categorize the topics. The result is following:

WordCloud of Name
WordCloud of City
WordCloud of State

Even though it is part of our result, there are still something interesting that we would like to share that they are quite matching with the reality, and support and verify our method, for example:

  1. In the wordcloud of Name, there are Cliton, undoubtably, and also some key participants of the elections: like Ivanka, Trump's daughter, and Giuliani, a senior statesman in Republican Party.

  2. Also, in the wordcloud of City, New York, the basis of Trump, has been highlighted, the same as Washington, the polictical hub of the United States.

  3. In the State wordcloud, Ohio is overwhelming, which perfectly matched Trump's historical victory in Ohio.

After filtering, the wordcloud of filtered keyword is following:

WordCloud of Filtered Keywords

And from them, based on the consideration of controversiality, we selected Clinton and Republican as our next step study.


Media Polarity of Clinton and Republican


In this part, we would like to try to utilize the relationship between trump and the selected topics to further evaluate the reliablilty of those conventional judgments.

The reason why we choose these two topics are that Trump was the representative of Republican Party, and thus there should be strong similarity between the attitudes towards these two entities. Furthermore we also cannot ignore Clinton as she was the opponent of Trump during the election.

Firstly, we would like to dig up the media polarity behind Clinton. As the first woman presidential candidate, her involvement in the campaign has involved tons of attentions.

The following are the media polarity checks about Clinton and Republican among the mainstream media three months before polling date.

In the following, we selected four mainstream media and analyze their media polarity based on the topic Trump, Clinton and Republican.



From the above chart, we could discover the following findings:

  1. Politico is always neutral as expected.
  2. It confirms again that Fox is a firm supporter of Republican. However, it seems like this media always goes to extreme, which may not be resposible to the public.
  3. For ABC, there indeed existed tendency towards praising Clinton, and criticizing Republican Party and Trump.
  4. The most astonishing finding is that CNN is not so polarized as expected based on our analysis. This can be seen from the distribution of polarity above that the marjority of CNN's news is located in the neutral region.


Other View of Media Polarity

We scatter the result with x-axis be positive polarty, and y-axis be negative polarity. After our observation, we find that it also match our hypothesis.

  1. We find that the value of x and y of Politico and CNN are quite similar, indicating that the opinion of Politico is quite neural.
  2. Fox's polarity pairs are far from the original point, which indicates that its attitude is likely to go to extreme.
  3. There is an obvious negative polarity of ABC on Republican

We also attached all the polarity of all medias we studies under the 4-media graph.


Conclusion

By investigating into the polarity of news just before the US election's polling date in 2016, some conventional judgments to the media are verified while some are challenged based on our analysis. Through out the whole process, we first successfully verified that Trump was the hottest topic during that periods. Then we observed the distinctions among media's attitudes towards this 'strange' president candidate. By focusing on the most related two entities with Trump, i.e., Clinton and Republican, and looking into the behaviors of four typical media in America, we eventually found out that public's conventional feeling that Politico tends to be neutral is correct and Fox is a solid supporter to Republican is correct. On the other hand, the so-called 'fake news' CNN as referred by Trump was not biased to some extent.

As mentioned above, media is a forth power in the civilized society. We hope media is able to shoulder the responsiblity of reporting truth and objective.


Methodology


Last but not least, we would like to brief introduced our methodologies:

I. Keyword Selection: the methods of TF-IDF will give each word in the document a significant weight, and we could select the top words as the keywords of the passage.

II. Sentimental Classifier: We applied the combination result of two advanced sentimental analysis packages in Python, NLTK and TextBlob. Here, we did the validation test based on the manual labels to the best threshold to classify the media attitude.

III. Topic Expansion: from Trump, we would like to expand topic set. We have three methods which could refer to our notebook. Basically, they are simple frequency check, cosine similarity check, and Latent Dirichlet allocation.