Adaptive Spam Filtering Using Dynamic Feature Space

Authors:
Yan Zhou;Madhuri S. Mulekar;Praveen Nerellapalli
Affiliations:
University of South Alabama;University of South Alabama;University of South Alabama
Venue:
ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
Year:
2005

Citing 0
Cited 7

Hoodwinking spam email filters

CEA'07 Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications
Malware detection using adaptive data compression

Proceedings of the 1st ACM workshop on Workshop on AISec
A survey of learning-based techniques of email spam filtering

Artificial Intelligence Review
A simple yet effective spam blocking method

Proceedings of the 2nd international conference on Security of information and networks
Detecting image based spam email

ICHIT'06 Proceedings of the 1st international conference on Advances in hybrid information technology
Humans and bots in internet chat: measurement, analysis, and automated classification

IEEE/ACM Transactions on Networking (TON)
Adaptive classification on brain-computer interfaces using reinforcement signals

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of e-mail collections over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed data distribution. We compare our technique to several existing off-line learning techniques including Support Vector Machine, Na篓ýve Bayes, -Nearest Neighbor, C4.5 decision tree, RBFNetwork, Boosted decision tree and Stacking, and demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available.