The automatic identification of stop words
Journal of Information Science
A vector space model for automatic indexing
Communications of the ACM
A statistical approach to the spam problem
Linux Journal
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
Clustering and classification of document structure-a machine learning approach
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
Bayesian network model for semi-structured document classification
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
MailRank: using ranking for spam detection
Proceedings of the 14th ACM international conference on Information and knowledge management
A Formal Approach towards Assessing the Effectiveness of Anti-Spam Procedures
HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 06
An Alliance-Based Anti-spam Approach
ICNC '07 Proceedings of the Third International Conference on Natural Computation - Volume 04
International Journal of Computer Applications in Technology
Hi-index | 0.00 |
Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. Many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods require a training step with labelled data. Dealing with the situation where the availability of labelled training instances is limited slows down the progress of filtering systems and offers advantages to spammers. Currently, many approaches direct their efforts into Semi-Supervised Learning (SSL). SSL is a halfway method between supervised and unsupervised learning, which, in addition to unlabelled data, receives some supervision information such as the association of the targets with some of the examples. Collective Classification for Text Classification poses as an interesting method for optimising the classification of partially-labelled data. In this way, we propose here, for the first time, Collective Classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.