SDAI: An integral evaluation methodology for content-based spam filtering models

  • Authors:
  • Noemí PéRez-DíAz;David Ruano-OrdáS;Florentino Fdez-Riverola;José R. MéNdez

  • Affiliations:
  • Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain;Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain;Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain;Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, University of Vigo, 32004 Ourense, Spain

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

Tragedy of Commons Theory introduced by Hardin (1968) revealed how shared and limited resources get completely depleted as effect of human behaviour. By analogy, common spamming activities can be properly modelled by this solid theory and, consequently, a young Internet Security Industry has recently emerged to fight against spam. However, the massive intensification of spam deliveries during last years has led to the need of achieving a significant improvement in filter accuracy. In this context, current research efforts are mainly focussed on providing a wide variety of content-based techniques able to overcome common spam filtering inconveniencies. Although theoretical filtering evaluation is generally taken into consideration in scientific works, most of the evaluation protocols are not appropriate to correctly assess the performance of models during filter operation in real environments. In order to cover the gap between basic research and applied deployment of well-known spam filtering techniques, this work proposes a novel straightforward evaluation methodology able to rank available models using four different but complementary perspectives: static, dynamic, adaptive and internationalisation. In the present study, we applied our SDAI methodology to compare eight different well-known content-based spam filtering techniques using several established accuracy measures. Results showed the effect of the knowledge grain-size and evidenced several unexpected situations related with the behaviour of analysed models.