Natural language processing on the first 2016 presidential debate
The first debate in the 2016 presidential race was held on September 26. Itâs no secret that Clinton and Trump are running on drastically different platforms, but how do they compare when it comes to their speech patterns and word choice? To quantify this, I dug into the data, using the debate transcript and natural language processing.
I measured the sentiment of Clintonâs and Trumpâs responses, and examined how emotional their words were throughout the debate. I also looked at each candidateâs most commonly used adjectives. Building off the work of Alvin Chang at Vox, I was also able to examine how the speech patterns of Clinton and Trump each changed when directly responding to and when skirting the questions.
Sentiment
Using the Google Cloud Natural Language API, I measured the sentiment of each candidateâs answers. The polarity of a response is a measure of how positive or negative it is, and the magnitude indicates how much emotion the words convey. The chart below shows the polarity of each candidateâs responses, weighted by the magnitude.
Trump and Clinton matched each otherâs polarity for the first half of the debate, but after his defense of stop-and-frisk around 9:50 PM, Trumpâs words became much more negative.
Throughout the rest of the debate â during the questions on birtherism, cyber security, homegrown terrorism, nuclear weapons, and Clintonâs looks and stamina â Clinton became more positive and Trump more negative.
The combination of polarity and magnitude gives us the best understanding of each lineâs overall sentiment, and each candidateâs most positive and negative responses are posted here.
Braggadocios, and other adjectives
I was also interested in the adjectives each candidate used most frequently during the debate. Using syntax analysis to extract each wordâs part of speech, I identified the most-used adjectives of each candidate.
Answers vs non-answers
As Chang found, the candidates spent a lot of time not answering Holtâs questions â 48% of Clintonâs words and a whopping 69% of Trumpâs words were used in non-answers â and using the data Chang compiled, I was able to look at how the candidate's speech patterns differed when answering and not answering the questions.
Sentence subjects (âI alone can fix itâ)
Using part-of-speech tagging, I also identified the subjects of each candidateâs sentences. Clinton was more inclusive in her words, but only when directly responding to questions â using the plural âweâ more frequently than the singular âIâ â and the the opposite was true for her when avoiding a response. Trump, on the other hand, was always more likely to use âIâ over âweâ.
Non-answer phrases
The words each candidate used when directly answering the questions are all, unsurprisingly, highly related to the questions Holt asked. Whatâs interesting here are the topics the candidates defaulted to when avoiding a response.
A handful of my findings didnât make it into this post. If youâre interested in more, thereâs some additional analysis, including multiple classification models, in the projectâs GitHub repo. The text of this article (excluding this sentence) has polarity -0.4 and magnitude 15.5, so despite my best efforts itâs leaning slightly negative.
R code posted here.
Many thanks to Alvin Chang and Vox for their permission to use their annotated transcript, and to Kelsey Scherer for designing the charts and lead image.
Analysis was performed in R. Plots were generated using ggplot2, and then styled by Scherer using Sketch.
The sentiment scores, part of speech tags, and all of the other NLP datasets can be found in the GitHub repo.











