Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
The Journal of Machine Learning Research
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Modeling and Predicting the Helpfulness of Online Reviews
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Automatic evaluation of text coherence: models and representations
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Detecting product review spammers using rating behaviors
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
Finding deceptive opinion spam by any stretch of the imagination
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatically evaluating text coherence using discourse relations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
Online reviews have been popularly adopted in many applications. Since they can either promote or harm the reputation of a product or a service, buying and selling fake reviews becomes a profitable business and a big threat. In this paper, we introduce a very simple, but powerful review spamming technique that could fail the existing feature-based detection algorithms easily. It uses one truthful review as a template, and replaces its sentences with those from other reviews in a repository. Fake reviews generated by this mechanism are extremely hard to detect: Both the state-of-the-art computational approaches and human readers acquire an error rate of 35%-48%, just slightly better than a random guess. While it is challenging to detect such fake reviews, we have made solid progress in suppressing them. A novel defense method that leverages the difference of semantic flows between synthetic and truthful reviews is developed, which is able to reduce the detection error rate to approximately 22%, a significant improvement over the performance of existing approaches. Nevertheless, it is still a challenging research task to further decrease the error rate. Synthetic Review Spamming Demo: www.cs.ucsb.edu/~alex_morales/reviewspam/