Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam
ACM SIGIR Forum
Proceedings of the 16th international conference on World Wide Web
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Spam filtering for short messages
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The ghost in the browser analysis of web-based malware
HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Extracting spam blogs with co-citation clusters
Proceedings of the 17th international conference on World Wide Web
Detecting spam blogs: an adaptive online approach
Detecting spam blogs: an adaptive online approach
What should blog search look like?
Proceedings of the 2008 ACM workshop on Search in social media
Detecting spam blogs: a machine learning approach
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Detecting comment spam through content analysis
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
This paper studies the problems and threats posed by a type of spam in the blogosphere, called blog comment spam. It explores the challenges introduced by comment spam, generalizing the analysis substantially to any other short text type spam. The authors analyze different high-level features of spam and legitimate comments based on the content of blog postings. The authors use these features to cluster data separately for each feature using K-Means clustering algorithm. The authors also use self-supervised learning, which could classify spam and legitimate comments automatically. Compared with existing solutions, this approach demonstrates more flexibility and adaptability to the environment, as it requires minimal human intervention. The preliminary evaluation of the proposed spam detection system shows promising results.