Self-training from labeled features for sentiment analysis

Authors:
Yulan He;Deyu Zhou
Affiliations:
Knowledge Media Institute, Open University, Walton Hall, Milton Keynes MK6 6AA, UK;School of Computer Science and Engineering, Southeast University, Nanjing, China
Venue:
Information Processing and Management: an International Journal
Year:
2011

Citing 26
Cited 3

Incorporating Prior Knowledge into Boosting

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Incorporating prior knowledge with weighted margin support vector machines

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Using appraisal groups for sentiment analysis

Proceedings of the 14th ACM international conference on Information and knowledge management
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Question Answering via Bayesian inference on lexical relations

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Determining the sentiment of opinions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identifying sources of opinions with conditional random fields and extraction patterns

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatic construction of polarity-tagged corpus from HTML documents

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

ACM Transactions on Information Systems (TOIS)
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Sentiment analysis of blogs by combining lexical knowledge with text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic seed word selection for unsupervised sentiment classification of Chinese text

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Fully automatic lexicon expansion for domain-oriented sentiment analysis

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Adding redundant features for CRFs-based sentence sentiment classification

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cross-task knowledge-constrained self training

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint sentiment/topic model for sentiment analysis

Proceedings of the 18th ACM conference on Information and knowledge management
SELC: a self-supervised model for sentiment classification

Proceedings of the 18th ACM conference on Information and knowledge management
Weakly supervised techniques for domain-independent sentiment classification

Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Sentiment analysis of conditional sentences

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Topic-wise, sentiment-wise, or otherwise?: Identifying the hidden dimension for unsupervised text classification

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A comparative study of Bayesian models for unsupervised sentiment detection

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Learning sentiment classification model from labeled features

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

A generalized cluster centroid based classifier for text categorization

Information Processing and Management: an International Journal
Implicit feature identification via hybrid association rule mining

Expert Systems with Applications: An International Journal
A weakly supervised approach to Chinese sentiment classification using partitioned self-training

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.