On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data

Authors:
Peter C. R. Lane;Daoud Clarke;Paul Hender
Affiliations:
School of Computer Science, University of Hertfordshire, College Lane, Hatfield AL10 9AB, Hertfordshire, UK;School of Computer Science, University of Hertfordshire, College Lane, Hatfield AL10 9AB, Hertfordshire, UK and Metrica, Banner Street, London EC1V 9BJ, UK;Metrica, Banner Street, London EC1V 9BJ, UK
Venue:
Decision Support Systems
Year:
2012

Citing 26
Cited 1

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Machine Learning

Machine Learning
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Subset Selection in Text-Learning

ECML '98 Proceedings of the 10th European Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning Subjective Language

Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Sentiment analysis of blogs by combining lexical knowledge with text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge transformation for cross-domain sentiment classification

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
On strategies for imbalanced text classification using SVM: A comparative study

Decision Support Systems
Using text mining and sentiment analysis for online forums hotspot detection and forecast

Decision Support Systems
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Aggregating opinions: explorations into graphs and media content analysis

TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Developing robust models for favourability analysis

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Detecting implicit expressions of sentiment in text based on commonsense knowledge

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Pulse: mining customer opinions from free text

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
A new evaluation measure for imbalanced datasets

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87

Sentiment classification: The contribution of ensemble learning

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Locating documents carrying positive or negative favourability is an important application within media analysis. This article presents some empirical results on the challenges facing a machine-learning approach to this kind of opinion mining. Some of the challenges include the often considerable imbalance in the distribution of positive and negative samples, changes in the documents over time, and effective training and evaluation procedures for the models. This article presents results on three data sets generated by a media-analysis company, classifying documents in two ways: detecting the presence of favourability, and assessing negative vs. positive favourability. We describe our experiments in developing a machine-learning approach to automate the classification process. We explore the effect of using five different types of features, the robustness of the models when tested on data taken from a later time period, and the effect of balancing the input data by undersampling. We find varying choices for the optimum classifier, feature set and training strategy depending on the task and data set.