On the relative age of spam and ham training samples for email filtering

  • Authors:
  • Gordon V. Cormack;Jose-Marcio Martins da Cruz

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada;Ecole des Mines de Paris, Paris, France

  • Venue:
  • Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Email spam filters are commonly trained on a sample of spam and ham (non-spam) messages. We investigate the effect on filter performance of using samples of spam and ham messages sent months before those to be filtered. Our results show that filter performance deteriorates with the overall age of spam and ham samples, but at different rates. Spam and ham samples of different ages may be mixed to advantage, provided temporal cues are elided