Learning to identify review spam

Authors:
Fangtao Li;Minlie Huang;Yi Yang;Xiaoyan Zhu
Affiliations:
State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Dept. of Computer Science and Technology, Tsinghua University, Beij ...;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Dept. of Computer Science and Technology, Tsinghua University, Beij ...;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Dept. of Computer Science and Technology, Tsinghua University, Beij ...;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Dept. of Computer Science and Technology, Tsinghua University, Beij ...
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 11
Cited 10

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Automatically assessing review helpfulness

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Opinion sentence search engine on open-domain blog

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Expanding domain sentiment lexicon through double propagation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Opinion formation under costly expression

ACM Transactions on Intelligent Systems and Technology (TIST)
Detecting product review spammers using rating behaviors

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Spotting fake reviewer groups in consumer reviews

Proceedings of the 21st international conference on World Wide Web
In search of a gold standard in studies of deception

EACL 2012 Proceedings of the Workshop on Computational Approaches to Deception Detection
Modeling review comments

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Simultaneously detecting fake reviews and review spammers using factor graph model

Proceedings of the 5th Annual ACM Web Science Conference
Spotting opinion spammers using behavioral footprints

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Why people hate your app: making sense of user feedback in a mobile app store

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Uncovering collusive spammers in Chinese review websites

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Detecting collusive spammers in online review communities

Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
Topic extraction from online reviews for classification and recommendation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A weakly supervised approach to Chinese sentiment classification using partitioned self-training

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, they may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. On product review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors' products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpfulness voting, or rating deviation, which limits the performance of this task. In this paper, we exploit machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. We first analyze the effect of various features in spam identification. We also observe that the review spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a twoview semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvements in comparison to the heuristic baselines.