Ad verba per numeros


Friday, June 18, 2010, 08:15 AM
Predicting elections from Twitter data is becoming a red hot topic of research.

In a previous post I've discussed some recent (and serious) studies on the field. More recently, there have been some press coverage like this article by New Scientist: "Blogs and tweets could predict the future". In addition to that, some reports claim to have (rather accurately) predicted elections in United Kingdom and Belgium by merely counting the number of mention each candidate had received in Twitter (1, 2, 3).

In that previous post I tried to argue why it's difficult. In this one I'll just try to provide a headline to bring some balance to the force: "Why you cannot (always and consistently) predict elections from Twitter".

  • First, not everybody is using Twitter. In particular, many voters are not using Twitter.
  • Second, not everybody using Twitter is publicly tweeting. In particular, many voters who use Twitter are tweeting in private and only their friends and acquaintances can read their tweets.
  • Third, not everybody publicly tweeting tweets about their political views. In particular, many voters who publicly tweet do not express their political views.
  • Fourth, not everybody publicly tweeting his political views is actually voting.
  • Fifth, not every conceivable tweet regarding political views of a person who's actually voting can be automatically *and accurately* analyzed to infer the actual vote.
Hence, by just analyzing the public Twitter stream with regards to one particular elections:

(1) You have missed:

  • The opinions of those not using Twitter.
  • The opinions of those not publicly tweeting.
  • The opinions of those publicly tweeting but not discussing politics.
(2) You have taken into account:
  • The opinions of those discussing politics on Twitter but who are not voting.
And, finally,

(3) You have inferred votes using noisy and not-that-accurate methods.

So, depending on who's using and not using Twitter you can be close or very far away to predict a given elections' outcome.

What's the problem with this?

We're seeing lots of buzz about positive results but very little discussion on negative results and why they occur and how they could be avoided.

In short, take it easy guys... It's a really awesome field of research but it simply cannot be that easy.



Back Next