Communications of the ACM
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating cost-sensitive Unsolicited Bulk Email categorization
Proceedings of the 2002 ACM symposium on Applied computing
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
Challenges of the Email Domain for Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Advanced Network Fingerprinting
RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Aggregated cross-media news visualization and personalization
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Hi-index | 0.00 |
In this paper, we study the problem of filtering unsolicited bulk emails, also known as spam emails. We apply a k-NN algorithm with a similarity measure called resemblance and compare it with the naive Bayes and the k-NN algorithm with TF-IDF weighting. Experimental evaluation shows that our method produces the lowest-cost results under different cost models of classification. Compared with TF-IDF weighting, our method is more practical in a dynamic environment. Also, our method successfully catches a notorious class of spams called picospams. We believe that it will be a useful member in a hybrid classifier.