Adaptive Spam Filtering Using Dynamic Feature Space

  • Authors:
  • Yan Zhou;Madhuri S. Mulekar;Praveen Nerellapalli

  • Affiliations:
  • University of South Alabama;University of South Alabama;University of South Alabama

  • Venue:
  • ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of e-mail collections over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed data distribution. We compare our technique to several existing off-line learning techniques including Support Vector Machine, Na篓ýve Bayes, -Nearest Neighbor, C4.5 decision tree, RBFNetwork, Boosted decision tree and Stacking, and demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available.