Text mining and probabilistic language modeling for online review spam detection

Authors:
Raymond Y. K. Lau;S. Y. Liao;Ron Chi-Wai Kwok;Kaiquan Xu;Yunqing Xia;Yuefeng Li
Affiliations:
City University of Hong Kong, China;City University of Hong Kong, China;City University of Hong Kong, China;Nanjing University, China;Tsinghua University, China;Queensland University of Technology, Australia
Venue:
ACM Transactions on Management Information Systems (TMIS)
Year:
2012

Citing 54
Cited 6

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Collection statistics for fast duplicate document detection

ACM Transactions on Information Systems (TOIS)
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Machine Learning

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Extraction and representation of contextual information for knowledge discovery in texts

Information Sciences—Informatics and Computer Science: An International Journal
The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms

Management Science
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive text mining and belief revision for intelligent information retrieval on the web

Web Intelligence and Agent Systems
Detecting phrase-level duplication on the world wide web

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An evolutionary learning approach for adaptive negotiation agents: Research Articles

International Journal of Intelligent Systems - Learning Approaches for Negotiation Agents and Automated Negotiation
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
OpinionFinder: a system for subjectivity analysis

HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Strategic Manipulation of Internet Opinion Forums: Implications for Consumers and Firms

Management Science
Inferential language models for information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Review spam detection

Proceedings of the 16th international conference on World Wide Web
Online supervised spam filter evaluation

ACM Transactions on Information Systems (TOIS)
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Designing novel review ranking systems: predicting the usefulness and impact of reviews

Proceedings of the ninth international conference on Electronic commerce
Spam filtering for short messages

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting splogs via temporal dynamics using self-similarity analysis

ACM Transactions on the Web (TWEB)
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Towards a belief-revision-based adaptive and context-sensitive information retrieval system

ACM Transactions on Information Systems (TOIS)
Partitioned logistic regression for spam filtering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Trusting spam reporters: A reporter-based reputation system for email filtering

ACM Transactions on Information Systems (TOIS)
Analyzing and Detecting Review Spam

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Exploring linguistic features for web spam detection: a preliminary study

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
A Design Science Research Methodology for Information Systems Research

Journal of Management Information Systems
A Research Agenda for Trust in Online Environments

Journal of Management Information Systems
Stylometric Identification in Electronic Markets: Scalability and Robustness

Journal of Management Information Systems
Modeling and Predicting the Helpfulness of Online Reviews

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
How opinions are received by online communities: a case study on amazon.com helpfulness votes

Proceedings of the 18th international conference on World wide web
Web spam identification through language model analysis

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Toward a Fuzzy Domain Ontology Extraction Method for Adaptive e-Learning

IEEE Transactions on Knowledge and Data Engineering
Link spam target detection using page farms

ACM Transactions on Knowledge Discovery from Data (TKDD)
Credibility of Electronic Word-of-Mouth: Informational and Normative Determinants of On-line Consumer Recommendations

International Journal of Electronic Commerce
Detecting spammers and content promoters in online video social networks

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Is spam an issue for opinionated blog post search?

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Automatically assessing review helpfulness

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A co-classification framework for detecting web spam and spammers in social media web sites

Proceedings of the 18th ACM conference on Information and knowledge management
Detecting product review spammers using rating behaviors

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Finding unusual review patterns using unexpected rules

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Toward a semantic granularity model for domain-specific information retrieval

ACM Transactions on Information Systems (TOIS)
Design science in information systems research

MIS Quarterly
Design science in the information systems discipline: an introduction to the special issue on design science research

MIS Quarterly
Detecting fake websites: the contribution of statistical learning theory

MIS Quarterly
Product-related deception in e-commerce: a theoretical perspective

MIS Quarterly

Design science and the accumulation of knowledge in the information systems discipline

ACM Transactions on Management Information Systems (TMIS)
Credit Rating Change Modeling Using News and Financial Ratios

ACM Transactions on Management Information Systems (TMIS)
Do Vendors’ Pricing Decisions Fully Reflect Information in Online Reviews?

ACM Transactions on Management Information Systems (TMIS)
Can we identify manipulative behavior and the corresponding suspects on review websites using supervised learning?

NordSec'12 Proceedings of the 17th Nordic conference on Secure IT Systems
A Dispatch-Mediated Communication Model for Emergency Response Systems

ACM Transactions on Management Information Systems (TMIS)
Detecting Deceptive Chat-Based Communication Using Typing Behavior and Message Cues

ACM Transactions on Management Information Systems (TMIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.