Spam filtering for short messages

  • Authors:
  • Gordon V. Cormack;José María Gómez Hidalgo;Enrique Puertas Sánz

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada;Universidad Europea de Madrid, Villaviciosa de Odón, Madrid, Spain;Universidad Europea de Madrid, Villaviciosa de Odón, Madrid, Spain

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.