Computing a Comprehensible Model for Spam Filtering

  • Authors:
  • Amparo Ruiz-Sepúlveda;José L. Triviño-Rodriguez;Rafael Morales-Bueno

  • Affiliations:
  • Department of Computer Science and Artificial Intelligence, University of Málaga, Málaga, Spain;Department of Computer Science and Artificial Intelligence, University of Málaga, Málaga, Spain;Department of Computer Science and Artificial Intelligence, University of Málaga, Málaga, Spain

  • Venue:
  • DS '09 Proceedings of the 12th International Conference on Discovery Science
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe the application of the Desicion Tree Boosting (DTB) learning model to spam email filtering.This classification task implies the learning in a high dimensional feature space. So, it is an example of how the DTB algorithm performs in such feature space problems. In [1], it has been shown that hypotheses computed by the DTB model are more comprehensible that the ones computed by another ensemble methods. Hence, this paper tries to show that the DTB algorithm maintains the same comprehensibility of hypothesis in high dimensional feature space problems while achieving the performance of other ensemble methods. Four traditional evaluation measures (precision, recall, F1 and accuracy) have been considered for performance comparison between DTB and others models usually applied to spam email filtering. The size of the hypothesis computed by a DTB is smaller and more comprehensible than the hypothesis computed by Adaboost and Naïve Bayes.