$100,000 prize jackpot. call now!: identifying the pertinent features of SMS spam

  • Authors:
  • Henry Tan;Nazli Goharian;Micah Sherr

  • Affiliations:
  • Georgetown University, Washington, DC, USA;Georgetown University, Washington, DC, USA;Georgetown University, Washington, DC, USA

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mobile SMS spam is on the rise and is a prevalent problem. While recent work has shown that simple machine learning techniques can distinguish between ham and spam with high accuracy, this paper explores the individual contributions of various textual features in the classification process. Our results reveal the surprising finding that simple is better: using the largest spam corpus of which we are aware, we find that using simple textual features is sufficient to provide accuracy that is nearly identical to that achieved by the best known techniques, while achieving a twofold speedup.