Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
IEEE Transactions on Knowledge and Data Engineering
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
A review of associative classification mining
The Knowledge Engineering Review
Multi-facet Rating of Product Reviews
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Get out the vote: determining support or opposition from congressional floor-debate transcripts
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
Sentiment Classification of web reviews or comments is an important and challenging task in Web Mining and Data Mining. This paper presents a novel approach using association rules for sentiment classification of web reviews. A new restraint measure AD-Sup is used to extract discriminative frequent term sets and eliminate terms with no sentiment orientation which contain close frequency in both positive and negative reviews. An optimal classification rule set is then generated which abandons the redundant general rule with lower confidence than the specific one. In the class label prediction procedure, we proposed a new metric voting scheme to solve the problem when the covered rules are not adequately confident or not applicable. The final score of a test review depends on the overall contributions of four metrics. Extensive experiments on multiple domain datasets from web site demonstrate that 50% is the best min-conf to guarantee classification rules both abundant and persuasive and the voting strategy obtains improvements on other baselines of using confidence. Another comparison to popular machine learning algorithms such as SVM, Naïve Bayes and kNN also indicates that the proposed method outperforms these strong benchmarks.