Semi-supervised spam filtering using aggressive consistency learning

  • Authors:
  • Mona Mojdeh;Gordon V. Cormack

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada

  • Venue:
  • Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A graph based semi-supervised method for email spam filtering, based on the local and global consistency method, yields low error rates with very few labeled examples. The motivating application of this method is spam filters with access to very few labeled message. For example, during the initial deployment of a spam filter, only a handful of labeled examples are available but unlabeled examples are plentiful. We demonstrate the performance of our approach on TREC 2007 and CEAS 2008 email corpora. Our results compare favorably with the best-known methods, using as few as just two labeled examples: one spam and one non-spam.