Semi-supervised spam filtering: does it work?

Authors:
Mona Mojdeh;Gordon V. Cormack
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 2
Cited 4

Making large-scale support vector machine learning practical

Advances in kernel methods
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research

Mining social networks for personalized email prioritization

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised spam filtering using aggressive consistency learning

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Clustering for semi-supervised spam filtering

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Can irrelevant data help semi-supervised learning, why and how?

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The results of the 2006 ECML/PKDD Discovery Challenge suggest that semi-supervised learning methods work well for spam filtering when the source of available labeled examples differs from those to be classified. We have attempted to reproduce these results using data from the 2005 and 2007 TREC Spam Track, and have found the opposite effect: methods like self-training and transductive support vector machines yield inferior classifiers to those constructed using supervised learning on the labeled data alone. We investigate differences between the ECML/PKDD and TREC data sets and methodologies that may account for the opposite results.