Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Automatically assessing review helpfulness
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Opinion sentence search engine on open-domain blog
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Expanding domain sentiment lexicon through double propagation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Co-training for cross-lingual sentiment classification
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Opinion formation under costly expression
ACM Transactions on Intelligent Systems and Technology (TIST)
Detecting product review spammers using rating behaviors
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Spotting fake reviewer groups in consumer reviews
Proceedings of the 21st international conference on World Wide Web
In search of a gold standard in studies of deception
EACL 2012 Proceedings of the Workshop on Computational Approaches to Deception Detection
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Simultaneously detecting fake reviews and review spammers using factor graph model
Proceedings of the 5th Annual ACM Web Science Conference
Spotting opinion spammers using behavioral footprints
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Why people hate your app: making sense of user feedback in a mobile app store
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Uncovering collusive spammers in Chinese review websites
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Detecting collusive spammers in online review communities
Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
Topic extraction from online reviews for classification and recommendation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A weakly supervised approach to Chinese sentiment classification using partitioned self-training
Journal of Information Science
Hi-index | 0.00 |
In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, they may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. On product review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors' products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpfulness voting, or rating deviation, which limits the performance of this task. In this paper, we exploit machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. We first analyze the effect of various features in spam identification. We also observe that the review spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a twoview semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvements in comparison to the heuristic baselines.