Compact features for sentiment analysis

  • Authors:
  • Lisa Gaudette;Nathalie Japkowicz

  • Affiliations:
  • School of Information Technology & Engineering, University of Ottawa, Ottawa, Ontario, Canada;School of Information Technology & Engineering, University of Ottawa, Ottawa, Ontario, Canada

  • Venue:
  • Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

This work examines a novel method of developing features to use for machine learning of sentiment analysis and related tasks. This task is frequently approached using a "Bag of Words" representation - one feature for each word encountered in the training data - which can easily involve thousands of features. This paper describes a set of compact features developed by learning scores for words, dividing the range of possible scores into a number of bins, and then generating features based on the distribution of scored words in the document over the bins. This allows for effective learning of sentiment and related tasks with 25 features; in fact, performance was very often slightly better with these features than with a simple bag of words baseline. This vast reduction in the number of features reduces training time considerably on large datasets, and allows for using much larger datasets than previously attempted with bag of words approaches, improving performance.