Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Mining and summarizing customer reviews
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Using appraisal groups for sentiment analysis
Proceedings of the 14th ACM international conference on Information and knowledge management
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A unified architecture for natural language processing: deep neural networks with multitask learning
Proceedings of the 25th international conference on Machine learning
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Dependency tree-based sentiment classification using CRFs with hidden variables
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning word vectors for sentiment analysis
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Semi-supervised recursive autoencoders for predicting sentiment distributions
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
REFSQ'13 Proceedings of the 19th international conference on Requirements Engineering: Foundation for Software Quality
Exploiting discourse information to identify paraphrases
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.