Spam filter evaluation with imprecise ground truth

Authors:
Gordon V. Cormack;Aleksander Kolcz
Affiliations:
University of Waterloo, Waterloo, ON, Canada;Microsoft Live Labs, Redmond, WA, USA
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 7
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A statistical approach to the spam problem

Linux Journal
On-line spam filter fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Online supervised spam filter evaluation

ACM Transactions on Information Systems (TOIS)
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Relaxed online SVMs for spam filtering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating classifiers by means of test data with noisy labels

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

When trained and evaluated on accurately labeled datasets, online email spam filters are remarkably effective, achieving error rates an order of magnitude better than classifiers in similar applications. But labels acquired from user feedback or third-party adjudication exhibit higher error rates than the best filters -- even filters trained using the same source of labels. It is appropriate to use naturally occuring labels -- including errors -- as training data in evaluating spam filters. Erroneous labels are problematic, however, when used as ground truth to measure filter effectiveness. Any measurement of the filter's error rate will be augmented and perhaps masked by the label error rate. Using two natural sources of labels, we demonstrate automatic and semi-automatic methods that reduce the influence of labeling errors on evaluation, yielding substantially more precise measurements of true filter error rates.