A comparative study of feature selection and machine learning techniques for sentiment analysis

  • Authors:
  • Anuj Sharma;Shubhamoy Dey

  • Affiliations:
  • Indian Institute of Management, Prabandh Shikhar, Rau, Indore, India;Indian Institute of Management, Prabandh Shikhar, Rau, Indore, India

  • Venue:
  • Proceedings of the 2012 ACM Research in Applied Computation Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sentiment analysis is performed to extract opinion and subjectivity knowledge from user generated text content. This is contextually different from traditional topic based text classification since it involves classifying opinionated text according to the sentiment conveyed by it. Feature selection is a critical task in sentiment analysis and effectively selected representative features from subjective text can improve sentiment based classification. This paper explores the applicability of five commonly used feature selection methods in data mining research (DF, IG, GR, CHI and Relief-F) and seven machine learning based classification techniques (Naïve Bayes, Support Vector Machine, Maximum Entropy, Decision Tree, K-Nearest Neighbor, Winnow, Adaboost) for sentiment analysis on online movie reviews dataset. The paper demonstrates that feature selection does improve the performance of sentiment based classification, but it depends on the method adopted and the number of feature selected. The experimental results presented in this paper show that Gain Ratio gives the best performance for sentimental feature selection, and SVM performs better than other techniques for sentiment based classification.