SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
NLTK: the Natural Language Toolkit
ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Spam filtering for short messages
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
CAPTCHA: using hard AI problems for security
EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
Recognizing objects in adversarial clutter: breaking a visual captcha
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Longtime behavior of harvesting spam bots
Proceedings of the 2012 ACM conference on Internet measurement conference
Hi-index | 0.00 |
Spamming refers to the process of providing unwanted and irrelevant information to the users. It is a widespread phenomenon that is often noticed in e-mails, instant messages, blogs and forums. In our paper, we consider the problem of spamming in blogs. In blogs, spammers usually target commenting systems which are provided by the authors to facilitate interaction with the readers. Unfortunately, spammers abuse these commenting systems by posting irrelevant and unsolicited content in the form of spam comments. Thus, we propose a novel methodology to classify comments into spam and non-spam using previously-undescribed features including certain blog post-comment relationships. Experiments conducted using our methodology produced a spam detection accuracy of 94.82% with a precision of 96.50% and a recall of 95.80%.