Sentiment analysis on evolving social streams: how self-report imbalances can help

Authors:
Pedro Calais Guerra;Wagner Meira, Jr.;Claire Cardie
Affiliations:
UFMG, Brazil, Belo Horizonte, MG, Brazil;UFMG, Brazil, Belo Horizonte, MG, Brazil;Cornell University, Ithaca, NY, NY, USA
Venue:
Proceedings of the 7th ACM international conference on Web search and data mining
Year:
2014

Citing 24
Cited 0

Learning in the presence of concept drift and hidden contexts

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A model for handling approximate, noisy or incomplete labeling in text classification

ICML '05 Proceedings of the 22nd international conference on Machine learning
The political blogosphere and the 2004 U.S. election: divided they blog

Proceedings of the 3rd international workshop on Link discovery
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
The Sound of Silence in Online Feedback: Estimating Trading Risks in the Presence of Reporting Bias

Management Science
Issues in evaluation of stream learning algorithms

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Overcoming the J-shaped distribution of product reviews

Communications of the ACM - A View of Parallel Computing
Beyond TFIDF weighting for text categorization in the vector space model

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Twitter power: Tweets as electronic word of mouth

Journal of the American Society for Information Science and Technology
Earthquake shakes Twitter users: real-time event detection by social sensors

Proceedings of the 19th international conference on World wide web
Towards detecting influenza epidemics by analyzing Twitter messages

Proceedings of the First Workshop on Social Media Analytics
Who uses Facebook? An investigation into the relationship between the Big Five, shyness, narcissism, loneliness, and Facebook usage

Computers in Human Behavior
Effective sentiment stream analysis with self-augmenting training and demand-driven projection

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
From bias to opinion: a transfer-learning approach to real-time sentiment analysis

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
On the utility of incremental feature selection for the classification of textual data streams

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Learning Classification with Auxiliary Probabilistic Information

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting social relations for sentiment analysis in microblogging

Proceedings of the sixth ACM international conference on Web search and data mining
Representation and communication: challenges in interpreting large social media datasets

Proceedings of the 2013 conference on Computer supported cooperative work
Voices of victory: a computational focus group framework for tracking opinion shift in real time

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-time sentiment analysis is a challenging machine learning task, due to scarcity of labeled data and sudden changes in sentiment caused by real-world events that need to be instantly interpreted. In this paper we propose solutions to acquire labels and cope with concept drift in this setting, by using findings from social psychology on how humans prefer to disclose some types of emotions. In particular, we use findings that humans are more motivated to report positive feelings rather than negative feelings and also prefer to report extreme feelings rather than average feelings. We map each of these self-report imbalances on two machine learning sub-tasks. The preference on the disclosure of positive feelings can be explored to generate labeled data on polarizing topics, where a positive event for one group usually induces negative feelings from the opposing group, generating an imbalance on user activity that unveils the current dominant sentiment. Based on the knowledge that extreme experiences are more reported than average experiences, we propose a feature representation strategy that focus on terms which appear at spikes in the social stream. When comparing to a static text representation (TF-IDF), we found that our feature representation is more capable of detecting new informative features that capture the sudden changes on sentiment stream caused by real-world events. We show that our social psychology-inspired framework produces accuracies up to 84% while analyzing live reactions in the debate of two popular sports on Twitter - soccer and football - despite requiring no human effort in generating supervisory labels.