Adaptive co-training SVM for sentiment classification on tweets

  • Authors:
  • Shenghua Liu;Fuxin Li;Fangtao Li;Xueqi Cheng;Huawei Shen

  • Affiliations:
  • Institute of Computing Technology, CAS, Beijing, China;Institute of Computing Technology, CAS, Beijing, China;Google Inc., Mountain View, CA, USA;Institute of Computing Technology, CAS, Beijing, China;Institute of Computing Technology, CAS, Beijing, China

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sentiment classification is an important problem in tweets mining. There lack labeled data and rating mechanism for generating them in Twitter service. And topics in Twitter are more diverse while sentiment classifiers always dedicate themselves to a specific domain or topic. Thus it is a challenge to make sentiment classification adaptive to diverse topics without sufficient labeled data. Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one. To tackle the tweet sparsity, non-text features are explored besides the conventional text features, which are intuitively split into two views. An iterative algorithm is proposed for solving this model by alternating among three steps: optimization, unlabeled data selection and adaptive feature expansion steps. The algorithm alternatively minimizes the margins of two independent objectives on different views to learn coefficient matrices, which are collaboratively used for unlabeled tweets selection from the topic that the algorithm is adapting to. And then topic-adaptive sentiment words are expended based on the above selection, in turn to help the first two steps find more confident and unlabeled tweets and boost the final performance. Comparing with the well-known supervised sentiment classifiers and semi-supervised approaches, our algorithm achieves promising increases in accuracy averagely on the 6 topics from public tweet corpus.