A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis

Authors:
A. Bhattarai;D. Dasgupta
Affiliations:
University of Memphis, USA;University of Memphis, USA
Venue:
International Journal of Information Security and Privacy
Year:
2011

Citing 16
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam

ACM SIGIR Forum
Review spam detection

Proceedings of the 16th international conference on World Wide Web
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Spam filtering for short messages

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The ghost in the browser analysis of web-based malware

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Extracting spam blogs with co-citation clusters

Proceedings of the 17th international conference on World Wide Web
Detecting spam blogs: an adaptive online approach

Detecting spam blogs: an adaptive online approach
What should blog search look like?

Proceedings of the 2008 ACM workshop on Search in social media
Detecting spam blogs: a machine learning approach

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Detecting comment spam through content analysis

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problems and threats posed by a type of spam in the blogosphere, called blog comment spam. It explores the challenges introduced by comment spam, generalizing the analysis substantially to any other short text type spam. The authors analyze different high-level features of spam and legitimate comments based on the content of blog postings. The authors use these features to cluster data separately for each feature using K-Means clustering algorithm. The authors also use self-supervised learning, which could classify spam and legitimate comments automatically. Compared with existing solutions, this approach demonstrates more flexibility and adaptability to the environment, as it requires minimal human intervention. The preliminary evaluation of the proposed spam detection system shows promising results.