Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
WWW '03 Proceedings of the 12th international conference on World Wide Web
Predicting the semantic orientation of adjectives
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Determining the sentiment of opinions
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
An empirical study of sentiment analysis for chinese documents
Expert Systems with Applications: An International Journal
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Mining opinion features in customer reviews
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Feature subsumption for opinion analysis
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Handbook of Natural Language Processing
Handbook of Natural Language Processing
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Lexicon-based Comments-oriented News Sentiment Analyzer system
Expert Systems with Applications: An International Journal
A comparative study of feature selection and machine learning techniques for sentiment analysis
Proceedings of the 2012 ACM Research in Applied Computation Symposium
A weakly supervised approach to Chinese sentiment classification using partitioned self-training
Journal of Information Science
Information Technology and Management
Hi-index | 12.05 |
Features play a fundamental role in sentiment classification. How to effectively select different types of features to improve sentiment classification performance is the primary topic of this paper. Ngram features are commonly employed in text classification tasks; in this paper, sentiment-words, substrings, substring-groups, and key-substring-groups, which have never been considered in sentiment classification area before, are also extracted as features. The extracted features are then compared and analyzed. To demonstrate generality, we use two authoritative Chinese data sets in different domains to conduct our experiments. Our statistical analysis of the experimental results indicate the following: (1) different types of features possess different discriminative capabilities in Chinese sentiment classification; (2) character bigram features perform the best among the Ngram features; (3) substring-group features have greater potential to improve the performance of sentiment classification by combining substrings of different lengths; (4) sentiment words or phrases extracted from existing sentiment lexicons are not effective for sentiment classification; (5) effective features are usually at varying lengths rather than fixed lengths.