Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A holistic lexicon-based approach to opinion mining
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
The viability of web-derived polarity lexicons
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Annotating named entities in Twitter data with crowdsourcing
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Lexicon-based methods for sentiment analysis
Computational Linguistics
Part-of-speech tagging for Twitter: annotation, features, and experiments
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Sentiment analysis of Twitter data
LSM '11 Proceedings of the Workshop on Languages in Social Media
Hi-index | 0.00 |
In recent years, social networks have become very popular. Twitter, a micro-blogging service, is estimated to have about 200 million registered users and these users create approximately 65 million tweets a day. Twitter users usually show their opinion about topics of their interest. The challenge is that each tweet is limited in 140 characters, and is hence very short. It may contain slang and misspelled words. Thus, it is difficult to apply traditional NLP techniques which are designed for working with formal languages, into Twitter domain. Another challenge is that the total volume of tweets is extremely high, and it takes a long time to process. In this paper, we describe a large-scale distributed system for real-time Twitter sentiment analysis. Our system consists of two components: a lexicon builder and a sentiment classifier. These two components are capable of running on a large-scale distributed system since they are implemented using a MapReduce framework and a distributed database model. Thus, our lexicon builder and sentiment classifier are scalable with the number of machines and the size of data. The experiments also show that our lexicon has a good quality in opinion extraction, and the accuracy of the sentiment classifier can be improved by combining the lexicon with a machine learning technique.