Tracking concept drift at feature selection stage in spamhunting: an anti-spam instance-based reasoning system

  • Authors:
  • J. R. Méndez;F. Fdez-Riverola;E. L. Iglesias;F. Díaz;J. M. Corchado

  • Affiliations:
  • Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Ourense, Spain;Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Ourense, Spain;Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Ourense, Spain;Dept. Informática, University of Valladolid, Escuela Universitaria de Informática, Segovia, Spain;Dept. Informática y Automática, University of Salamanca, Salamanca, Spain

  • Venue:
  • ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a novel feature selection method able to handle concept drift problems in spam filtering domain. The proposed technique is applied to a previous successful instance-based reasoning e-mail filtering system called SpamHunting. Our achieved information criterion is based on several ideas extracted from the well-known information measure introduced by Shannon. We show how results obtained by our previous system in combination with the improved feature selection method outperforms classical machine learning techniques and other well-known lazy learning approaches. In order to evaluate the performance of all the analysed models, we employ two different corpus and six well-known metrics in various scenarios.