Adaptive email spam filtering based on information theory

  • Authors:
  • Xin Zhang;Wenyuan Dai;Gui-Rong Xue;Yong Yu

  • Affiliations:
  • Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • WISE'07 Proceedings of the 8th international conference on Web information systems engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most previous email spam filtering techniques rely on traditional classification learning which assumes the data from training and test sets are drawn from the same underlying distribution. However, in practice, this identical-distribution assumption often violates. In general, email service providers collect training data from various public available resources, while the tasks focus on users' individual inboxes. Topics in the mail-boxes vary among different users, and distributions shift as a result. In this paper, we propose an adaptive email spam filtering algorithm based on information theory which relaxes the identical-distribution assumption and adapts the knowledge learned from one distribution to another. Our work focuses on the content analysis which minimizes the loss in mutual information between email instances and word features, before and after classification. We present theoretical and empirical analyses to show that our algorithm is able to solve the adaptive email spam filtering problem well. The experimental results show that our algorithm greatly improves the accuracy of email filtering, against the traditional classification algorithms, while scaling very well.