A machine learning approach to sentiment analysis in multilingual Web texts

Authors:
Erik Boiy;Marie-Francine Moens
Affiliations:
Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium
Venue:
Information Retrieval
Year:
2009

Citing 39
Cited 15

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Direction-based text interpretation as an information access refinement

Text-based intelligent systems
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A maximum entropy approach to natural language processing

Computational Linguistics
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Active learning using adaptive resampling

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A model of textual affect sensing using real-world knowledge

Proceedings of the 8th international conference on Intelligent user interfaces
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Statistics-Based Summarization - Step One: Sentence Compression

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Learning Subjective Adjectives from Corpora

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Mining the peanut gallery: opinion extraction and semantic classification of product reviews

WWW '03 Proceedings of the 12th international conference on World Wide Web
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Active Sampling for Class Probability Estimation and Ranking

Machine Learning
Automatic detection of text genre

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Predicting the semantic orientation of adjectives

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Effects of adjective orientation and gradability on sentence subjectivity

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
On Learning Parsimonious Models for Extracting Consumer Opinions

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 3 - Volume 03
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A decision tree of bigrams is an accurate predictor of word sense

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Learning subjective nouns using extraction pattern bootstrapping

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Understanding how bloggers feel: recognizing affect in blog posts

CHI '06 Extended Abstracts on Human Factors in Computing Systems
Learning to classify documents according to genre: Special Topic Section on Computational Analysis of Style

Journal of the American Society for Information Science and Technology
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
Opinion mining in legal blogs

Proceedings of the 11th international conference on Artificial intelligence and law
Generating a Topic Hierarchy from Dialect Texts

DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Bootstrapping both Product Properties and Opinion Words from Chinese Reviews with Cross-Training

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Mining opinion features in customer reviews

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Representative sampling for text classification using support vector machines

ECIR'03 Proceedings of the 25th European conference on IR research
Active learning strategies: a case study for detection of emotions in speech

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Active learning with committees for text categorization

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
A study of blog search

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Evaluating multilanguage-comparability of subjectivity analysis systems

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Sentiment analysis with a multilingual pipeline

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Deciphering word-of-mouth in social media: Text-based metrics of consumer reviews

ACM Transactions on Management Information Systems (TMIS)
Generating syntactic tree templates for feature-based opinion mining

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Automatic detection of political opinions in tweets

ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Emotion tokens: bridging the gap among multilingual twitter sentiment analysis

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers

Neurocomputing
Identifying the semantic orientation of terms using S-HAL for sentiment analysis

Knowledge-Based Systems
Product Comparison Networks for Competitive Analysis of Online Word-of-Mouth

ACM Transactions on Management Information Systems (TMIS)
Sentiment Analysis of Turkish Political News

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
More than words: Social networks' text mining for consumer brand sentiments

Expert Systems with Applications: An International Journal
Automated crime report analysis and classification for e-government and decision support

Proceedings of the 14th Annual International Conference on Digital Government Research
Evaluation of an algorithm for aspect-based opinion mining using a lexicon-based approach

Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining
Can predicate-argument structures be used for contextual opinion retrieval from blogs?

World Wide Web
Sentiment classification: The contribution of ensemble learning

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.