Towards detecting influenza epidemics by analyzing Twitter messages

  • Authors:
  • Aron Culotta

  • Affiliations:
  • Southeastern Louisiana University, Hammond, LA

  • Venue:
  • Proceedings of the First Workshop on Social Media Analytics
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

Rapid response to a health epidemic is critical to reduce loss of life. Existing methods mostly rely on expensive surveys of hospitals across the country, typically with lag times of one to two weeks for influenza reporting, and even longer for less common diseases. In response, there have been several recently proposed solutions to estimate a population's health from Internet activity, most notably Google's Flu Trends service, which correlates search term frequency with influenza statistics reported by the Centers for Disease Control and Prevention (CDC). In this paper, we analyze messages posted on the micro-blogging site Twitter.com to determine if a similar correlation can be uncovered. We propose several methods to identify influenza-related messages and compare a number of regression models to correlate these messages with CDC statistics. Using over 500,000 messages spanning 10 weeks, we find that our best model achieves a correlation of .78 with CDC statistics by leveraging a document classifier to identify relevant messages.