GNUsmail: Open Framework for On-line Email Classification

  • Authors:
  • José M. Carmona-Cejudo;Manuel Baena-García;José del Campo-Ávila;Rafael Morales-Bueno;Albert Bifet

  • Affiliations:
  • Universidad de Málaga, Spain, email: {jmcarmona, mbaena, jcampo, morales}@lcc.uma.es;Universidad de Málaga, Spain, email: {jmcarmona, mbaena, jcampo, morales}@lcc.uma.es;Universidad de Málaga, Spain, email: {jmcarmona, mbaena, jcampo, morales}@lcc.uma.es;Universidad de Málaga, Spain, email: {jmcarmona, mbaena, jcampo, morales}@lcc.uma.es;University of Waikato, New Zealand, email: abifet@cs.waikato.ac.nz

  • Venue:
  • Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Real-time classification of massive email data is a challenging task that presents its own particular difficulties. Since email data presents an important temporal component, several problems arise: emails arrive continuously, and the criteria used to classify those emails can change, so the learning algorithms have to be able to deal with concept drift. Our problem is more general than spam detection, which has received much more attention in the literature. In this paper we present GNUsmail, an open-source extensible framework for email classification, which structure supports incremental and on-line learning. This framework enables the incorporation of algorithms developed by other researchers, such as those included in WEKA and MOA. We evaluate this framework, characterized by two overlapping phases (pre-processing and learning), using the ENRON dataset, and we compare the results achieved by WEKA and MOA algorithms.