Towards building large-scale distributed systems for twitter sentiment analysis

  • Authors:
  • Vinh Ngoc Khuc;Chaitanya Shivade;Rajiv Ramnath;Jay Ramanathan

  • Affiliations:
  • The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University

  • Venue:
  • Proceedings of the 27th Annual ACM Symposium on Applied Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, social networks have become very popular. Twitter, a micro-blogging service, is estimated to have about 200 million registered users and these users create approximately 65 million tweets a day. Twitter users usually show their opinion about topics of their interest. The challenge is that each tweet is limited in 140 characters, and is hence very short. It may contain slang and misspelled words. Thus, it is difficult to apply traditional NLP techniques which are designed for working with formal languages, into Twitter domain. Another challenge is that the total volume of tweets is extremely high, and it takes a long time to process. In this paper, we describe a large-scale distributed system for real-time Twitter sentiment analysis. Our system consists of two components: a lexicon builder and a sentiment classifier. These two components are capable of running on a large-scale distributed system since they are implemented using a MapReduce framework and a distributed database model. Thus, our lexicon builder and sentiment classifier are scalable with the number of machines and the size of data. The experiments also show that our lexicon has a good quality in opinion extraction, and the accuracy of the sentiment classifier can be improved by combining the lexicon with a machine learning technique.