Collective classification for spam filtering

  • Authors:
  • Carlos Laorden;Borja Sanz;Igor Santos;Patxi Galán-García;Pablo G. Bringas

  • Affiliations:
  • DeustoTech Computing, S3Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S3Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S3Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S3Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S3Lab, University of Deusto, Bilbao, Spain

  • Venue:
  • CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. Many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods require a training step with labelled data. Dealing with the situation where the availability of labelled training instances is limited slows down the progress of filtering systems and offers advantages to spammers. Currently, many approaches direct their efforts into Semi-Supervised Learning (SSL). SSL is a halfway method between supervised and unsupervised learning, which, in addition to unlabelled data, receives some supervision information such as the association of the targets with some of the examples. Collective Classification for Text Classification poses as an interesting method for optimising the classification of partially-labelled data. In this way, we propose here, for the first time, Collective Classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.