TwiSent: a multistage system for analyzing sentiment in twitter

Authors:
Subhabrata Mukherjee;Akshat Malu;Balamurali A.R.;Pushpak Bhattacharyya
Affiliations:
IIT Bombay, Mumbai, India;IIT Bombay, Mumbai, India;IITB-Monash Research Academy, IIT Bombay, Mumbai, India;IIT Bombay, Mumbai, India
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 8
Cited 0

Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Using emoticons to reduce dependency in machine learning techniques for sentiment classification

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Classifying sentiment in microblogs: is brevity an advantage?

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Robust sentiment detection on Twitter from biased and noisy data

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
C-Feel-It: a sentiment analyzer for micro-blogs

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Identifying sarcasm in Twitter: a closer look

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Feature specific sentiment analysis for product reviews

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present TwiSent, a sentiment analysis system for Twitter. Based on the topic searched, TwiSent collects tweets pertaining to it and categorizes them into the different polarity classes positive, negative and objective. However, analyzing micro-blog posts have many inherent challenges compared to the other text genres. Through TwiSent, we address the problems of 1) Spams pertaining to sentiment analysis in Twitter, 2) Structural anomalies in the text in the form of incorrect spellings, nonstandard abbreviations, slangs etc., 3) Entity specificity in the context of the topic searched and 4) Pragmatics embedded in text. The system performance is evaluated on manually annotated gold standard data and on an automatically annotated tweet set based on hashtags. It is a common practise to show the efficacy of a supervised system on an automatically annotated dataset. However, we show that such a system achieves lesser classification accurcy when tested on generic twitter dataset. We also show that our system performs much better than an existing system.