Making large-scale support vector machine learning practical
Advances in kernel methods
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Mining social networks for personalized email prioritization
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised spam filtering using aggressive consistency learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Clustering for semi-supervised spam filtering
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Can irrelevant data help semi-supervised learning, why and how?
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The results of the 2006 ECML/PKDD Discovery Challenge suggest that semi-supervised learning methods work well for spam filtering when the source of available labeled examples differs from those to be classified. We have attempted to reproduce these results using data from the 2005 and 2007 TREC Spam Track, and have found the opposite effect: methods like self-training and transductive support vector machines yield inferior classifiers to those constructed using supervised learning on the labeled data alone. We investigate differences between the ECML/PKDD and TREC data sets and methodologies that may account for the opposite results.