Pitfalls in Tolerance: An Analysis of Twitter Data for Electoral Predictions

By Lesley Dudden | February 05, 2014

Small Photo

As we all know, Twitter is a platform for creating and sharing short bursts of information instantly and without borders. Scholars have taken note and analyze Twitter data to “take the pulse” of society. Since 2010 a number of studies have tried to assess the viability of Twitter as a substitute for traditional electoral prediction methods. They have ranged from theoretical works to data analysis. These studies have been inspired by the lure of access to real-time information and the ease of collecting this data.

In recent study, Daniel Gayo-Avello of the University of Oviedo in Spain examined a number of previous attempts at predicting elections using Twitter data. The author conducted a meta analysis of fifteen prior studies to analyse whether Twitter data can be used to predict election results. He found that the 'presumed predictive power regarding electoral prediction has been somewhat exaggerated: although social media may provide a glimpse on electoral outcomes current research does not provide strong evidence to support it can currently replace traditional polls."

The author explains:

"A considerable number of scholars have used tweets as indicators of electoral outcomes in different countries. Often, these studies utilize the number of mentions of a party or a candidate on Twitter before an upcoming election as an indicator of the vote share that party or candidate will receive.... In election polls, respondents are asked to state the party or candidate they are likely to vote for. Then, mentions are counted and the proportion of mentions is interpreted as an estimate of the vote share a party or candidate will receive. Likewise, in social media predictions, party and candidate mentions are counted and the proportion of mentions is interpreted as an estimate of the vote share a party or candidate will receive.  [However] social media users are not asked to state their voting intention. Rather, they mention a party or candidate for any reason. When mentioning a party, they might praise it, criticize it or be neutral towards it. So, social media data do not necessarily reflect voting intentions. In addition, social media users certainly do not represent a random sample from the electorate, while professional survey analysts make considerable efforts to draw a random sample from the electorate. In effect, social media data can differ dramatically from election survey data." 

Gayo Avelo found that tweet counting is not a reliable election prediction method. He further found that it is unclear whether or not sentiment analysis - weighing positie, negative or neutral expressions toward a party of candidate - has has an impact on Twitter-based predictions; although lexicon-based sentiment analysis does tend to outperform tweet counting in its predictive value.

NDI uses a social media monitoring platform, Crimson Hexagon, in an election contest to assess what individuals in a given country say online about issues such as corruption or violence, for instance.  We also use social media monitoring to assess attitudes  

One of the fundamental problems of using Twitter data as a predictive indicator is that the platform’s user base is not a representative sample of the population. Twitter is dominated by younger users, and also utilized by only a small percentage of individuals. This means that data collected through the platform needs to be weighted according to demographic strata lest it tilt toward a few select political opinions. There are also “noisy” tweets, or tweets spawning from spammers, bots, and troll that need to be filtered out. Sarcasm and humor are often used by twitter users and indeed, may identify political leanings but they present complex material in terms of qualifying them in either a sentiment analysis or tweet count study.

Another potential pitfall of Twitter data is as simple as the period during which it is collected for any electoral predictions. Data collected from different time periods was found to result in significant variations in prediction results. According to Gayo-Avello, the debate over whether or not Twitter data can legitimately be user to predict elections is centered on, “methodological issues regarding data collection, vote prediction, and performance evaluation.” Gayo-Avello also emphasizes that studies using Twitter data should seek to develop methods to infer the size of silent majorities alongside of seeking to “de-noise” irrelevant Tweets.

In short, take with a grain of salt mdia coverying hyping Twitter serving as an all-powerful platform for predicting elections. Click here to read the entire study, “A meta-analysis of state-of-the-art electoral prediction from Twitter data.”