Synthetic review spamming and defense

Authors:
Huan Sun;Alex Morales;Xifeng Yan
Affiliations:
University of California, Santa Barbara, Santa Barbara, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 14
Cited 0

Identifying and Filtering Near-Duplicate Documents

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Latent dirichlet allocation

The Journal of Machine Learning Research
Detecting phrase-level duplication on the world wide web

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Plagiarism analysis, authorship identification, and near-duplicate detection PAN'07

ACM SIGIR Forum
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Finding text reuse on the web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Modeling and Predicting the Helpfulness of Online Reviews

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Automatic evaluation of text coherence: models and representations

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Detecting product review spammers using rating behaviors

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Finding deceptive opinion spam by any stretch of the imagination

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatically evaluating text coherence using discourse relations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online reviews have been popularly adopted in many applications. Since they can either promote or harm the reputation of a product or a service, buying and selling fake reviews becomes a profitable business and a big threat. In this paper, we introduce a very simple, but powerful review spamming technique that could fail the existing feature-based detection algorithms easily. It uses one truthful review as a template, and replaces its sentences with those from other reviews in a repository. Fake reviews generated by this mechanism are extremely hard to detect: Both the state-of-the-art computational approaches and human readers acquire an error rate of 35%-48%, just slightly better than a random guess. While it is challenging to detect such fake reviews, we have made solid progress in suppressing them. A novel defense method that leverages the difference of semantic flows between synthetic and truthful reviews is developed, which is able to reduce the detection error rate to approximately 22%, a significant improvement over the performance of existing approaches. Nevertheless, it is still a challenging research task to further decrease the error rate. Synthetic Review Spamming Demo: www.cs.ucsb.edu/~alex_morales/reviewspam/