Learning to filter junk e-mail from positive and unlabeled examples

Authors:
Karl-Michael Schneider
Affiliations:
Department of General Linguistics, University of Passau
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 8
Cited 1

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Bayesian online classifiers for text classification and filtering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
One-class svms for document classification

The Journal of Machine Learning Research
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Learning from positive and unlabelled examples using maximum margin clustering

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the applicability of partially supervised text classification to junk mail filtering, where a given set of junk messages serve as positive examples while the messages received by a user are unlabeled examples, but there are no negative examples. Supplying a junk mail filter with a large set of junk mails could result in an algorithm that learns to filter junk mail without user intervention and thus would significantly improve the usability of an e-mail client. We study several learning algorithms that take care of the unlabeled examples in different ways and present experimental results.