Comment spam classification in blogs through comment analysis and comment-blog post relationships

Authors:
Ashwin Rajadesingan;Anand Mahendran
Affiliations:
School of Computer Science and Engineering, VIT University, Vellore, TN, India;School of Computer Science and Engineering, VIT University, Vellore, TN, India
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Year:
2012

Citing 7
Cited 1

An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
NLTK: the Natural Language Toolkit

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Spam filtering for short messages

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
CAPTCHA: using hard AI problems for security

EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
Recognizing objects in adversarial clutter: breaking a visual captcha

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Longtime behavior of harvesting spam bots

Proceedings of the 2012 ACM conference on Internet measurement conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spamming refers to the process of providing unwanted and irrelevant information to the users. It is a widespread phenomenon that is often noticed in e-mails, instant messages, blogs and forums. In our paper, we consider the problem of spamming in blogs. In blogs, spammers usually target commenting systems which are provided by the authors to facilitate interaction with the readers. Unfortunately, spammers abuse these commenting systems by posting irrelevant and unsolicited content in the form of spam comments. Thus, we propose a novel methodology to classify comments into spam and non-spam using previously-undescribed features including certain blog post-comment relationships. Experiments conducted using our methodology produced a spam detection accuracy of 94.82% with a precision of 96.50% and a recall of 95.80%.