Effective sentiment stream analysis with self-augmenting training and demand-driven projection

Authors:
Ismael Santana Silva;Janaína Gomide;Adriano Veloso;Wagner Meira, Jr.;Renato Ferreira
Affiliations:
Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 29
Cited 5

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Support-Vector Networks

Machine Learning
Learning in the presence of concept drift and hidden contexts

Machine Learning
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Learning with Partial Instance Memory

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate decision trees for mining high-speed data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Memory issues in frequent itemset mining

Proceedings of the 2004 ACM symposium on Applied computing
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Tackling concept drift by temporal inductive transfer

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

The Journal of Machine Learning Research
Categorizing and mining concept drifting data streams

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Active Learning from Data Streams

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Micro-blogging as online word of mouth branding

CHI '09 Extended Abstracts on Human Factors in Computing Systems
New ensemble methods for evolving data streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Issues in evaluation of stream learning algorithms

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Using emoticons to reduce dependency in machine learning techniques for sentiment classification

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Twitter power: Tweets as electronic word of mouth

Journal of the American Society for Information Science and Technology
Characterizing debate performance via aggregated twitter sentiment

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Semi-Supervised Learning

Semi-Supervised Learning
Sentiment knowledge discovery in twitter streaming data

DS'10 Proceedings of the 13th international conference on Discovery science
Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining

Sentiment analysis on twitter data for portuguese language

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Named entity disambiguation in streaming data

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Towards social imagematics: sentiment analysis in social multimedia

Proceedings of the Thirteenth International Workshop on Multimedia Data Mining
Exploring weakly supervised latent sentiment explanations for aspect-level review analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Sentiment analysis on evolving social streams: how self-report imbalances can help

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

How do we analyze sentiments over a set of opinionated Twitter messages? This issue has been widely studied in recent years, with a prominent approach being based on the application of classification techniques. Basically, messages are classified according to the implicit attitude of the writer with respect to a query term. A major concern, however, is that Twitter (and other media channels) follows the data stream model, and thus the classifier must operate with limited resources, including labeled data for training classification models. This imposes serious challenges for current classification techniques, since they need to be constantly fed with fresh training messages, in order to track sentiment drift and to provide up-to-date sentiment analysis. We propose solutions to this problem. The heart of our approach is a training augmentation procedure which takes as input a small training seed, and then it automatically incorporates new relevant messages to the training data. Classification models are produced on-the-fly using association rules, which are kept up-to-date in an incremental fashion, so that at any given time the model properly reflects the sentiments in the event being analyzed. In order to track sentiment drift, training messages are projected on a demand driven basis, according to the content of the message being classified. Projecting the training data offers a series of advantages, including the ability to quickly detect trending information emerging in the stream. We performed the analysis of major events in 2010, and we show that the prediction performance remains about the same, or even increases, as the stream passes and new training messages are acquired. This result holds for different languages, even in cases where sentiment distribution changes over time, or in cases where the initial training seed is rather small. We derive lower-bounds for prediction performance, and we show that our approach is extremely effective under diverse learning scenarios, providing gains that range from 7% to 58%.