A feature selection method based on improved fisher's discriminant ratio for text sentiment classification

  • Authors:
  • Suge Wang;Deyu Li;Xiaolei Song;Yingjie Wei;Hongxia Li

  • Affiliations:
  • School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, China and Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of E ...;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Taiyuan, 030006 Shanxi, China and School of Mathematics Science, Shanxi University, Taiyua ...;School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, China;Science Press, 100717 Beijing, China;School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Owing to its openness, virtualization and sharing criterion, the Internet has been rapidly becoming a platform for people to express their opinion, attitude, feeling and emotion. As the subjectivity texts are often too many for people to go through, how to automatically classify them into different sentiment orientation categories (e.g. positive/negative) has become an important research problem. In this paper, based on Fisher's discriminant ratio, an effective feature selection method is proposed for subjectivity text sentiment classification. In order to validate the proposed method, we compared it with the method based on Information Gain while Support Vector Machine is adopted as the classifier. Two experiments are conducted by combining different feature selection methods with two kinds of candidate feature sets. Under 2739 subjectivity documents of COAE2008s and 1006 car-related subjectivity documents, the experimental results indicate that the Fisher's discriminant ratio based on word frequency estimation has the best performance respectively with accuracy 86.61% and 82.80% under two corpus while the candidate features are the words which appear in both positive and negative texts.