Collective classification for spam filtering

Authors:
Carlos Laorden;Borja Sanz;Igor Santos;Patxi Galán-García;Pablo G. Bringas
Affiliations:
DeustoTech Computing, S³Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S³Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S³Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S³Lab, University of Deusto, Bilbao, Spain;DeustoTech Computing, S³Lab, University of Deusto, Bilbao, Spain
Venue:
CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
Year:
2011

Citing 11
Cited 0

The automatic identification of stop words

Journal of Information Science
A vector space model for automatic indexing

Communications of the ACM
A statistical approach to the spam problem

Linux Journal
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

Information Retrieval
Clustering and classification of document structure-a machine learning approach

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
Bayesian network model for semi-structured document classification

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
MailRank: using ranking for spam detection

Proceedings of the 14th ACM international conference on Information and knowledge management
A Formal Approach towards Assessing the Effectiveness of Anti-Spam Procedures

HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 06
An Alliance-Based Anti-spam Approach

ICNC '07 Proceedings of the Third International Conference on Natural Computation - Volume 04
Comparative analysis of regression and machine learning methods for predicting fault proneness models

International Journal of Computer Applications in Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. Many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods require a training step with labelled data. Dealing with the situation where the availability of labelled training instances is limited slows down the progress of filtering systems and offers advantages to spammers. Currently, many approaches direct their efforts into Semi-Supervised Learning (SSL). SSL is a halfway method between supervised and unsupervised learning, which, in addition to unlabelled data, receives some supervision information such as the association of the targets with some of the examples. Collective Classification for Text Classification poses as an interesting method for optimising the classification of partially-labelled data. In this way, we propose here, for the first time, Collective Classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.