Feature subsumption for sentiment classification in multiple languages

  • Authors:
  • Zhongwu Zhai;Hua Xu;Jun Li;Peifa Jia

  • Affiliations:
  • State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University, Beijing, P.R China;State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University, Beijing, P.R China;State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University, Beijing, P.R China;State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T Department, Tsinghua University, Beijing, P.R China

  • Venue:
  • PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

An open problem in machine learning-based sentiment classification is how to extract complex features that outperform simple features; figuring out which types of features are most valuable is another Most of the studies focus primarily on character or word Ngrams features, but substring-group features have never been considered in sentiment classification area before In this study, the substring-group features are extracted and selected for sentiment classification by means of transductive learning-based algorithm To demonstrate generality, experiments have been conducted on three open datasets in three different languages: Chinese, English and Spanish The experimental results show that the proposed algorithm's performance is usually superior to the best performance in related work, and the proposed feature subsumption algorithm for sentiment classification is multilingual Compared to the inductive learning-based algorithm, the experimental results also illustrate that the transductive learning-based algorithm can significantly improve the performance of sentiment classification As for term weighting, the experiments show that the “tfidf-c” outperforms all other term weighting approaches in the proposed algorithm.