Filtering spams using the minimum description length principle

  • Authors:
  • Tiago A. Almeida;Akebo Yamakami;Jurandy Almeida

  • Affiliations:
  • University of Campinas, Campinas, SP, Brazil;University of Campinas, Campinas, SP, Brazil;University of Campinas, Campinas, SP, Brazil

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spam has become an increasingly important problem with a big economic impact in society. Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. In this paper, we present a novel approach to spam filtering based on the minimum description length principle. The proposed model is fast to construct and incrementally updateable. Additionally, we offer an analysis concerning the measurements usually employed to evaluate the quality of the anti-spam classifiers. In this sense, we present a new measure in order to provide a fairer comparison. Furthermore, we conducted an empirical experiment using six well-known, large and public databases. Finally, the results indicate that our approach outperforms the state-of-the-art spam filters.