Personalized Spam Filtering with Semi-supervised Classifier Ensemble

Authors:
Victor Cheng;C. H. Li
Affiliations:
Hong Kong Baptist University, Hong Kong;Hong Kong Baptist University, Hong Kong
Venue:
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2006

Citing 0
Cited 6

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
An innovative analyser for multi-classifier e-mail classification based on grey list analysis

Journal of Network and Computer Applications
Symbiotic Data Mining for Personalized Spam Filtering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
e-mail authentication system: a spam filtering for smart senders

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Symbiotic filtering for spam email detection

Expert Systems with Applications: An International Journal
Batch-Mode Active Learning with Semi-supervised Cluster Tree for Text Classification

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

The proliferation of unsolicited emails, also known as spam, poses significant burden to email users worldwide. Recent researches on spam filtering have shown that high accuracies can be obtained if labeled emails examples are available from the particular user of the spam filter. However, the time consuming process of providing personalized labeled training examples is often inconvenient or impossible due to privacy issues. In this paper, a semi-supervised personalized spam filter based on classifier ensemble is proposed that classifies user's emails accurately by learning on both generic labeled emails and personalized unlabeled emails. The proposed multi-stage classification process begins learning a SVM model from labeled generic data. Unlabeled user's emails are then fed to this SVM to generate personalized labeled data for constructing personalized naive Bayes classifiers. Furthermore, some personalized labeled examples are generated by exploiting rare word distributions and then fed into a semi-supervised classifier. The multi-stage results are integrated with SVMs learned from generic labeled emails to produce the final classification results. Experimental results show that the proposed approaches can significantly increases the classification accuracy in spam filtering.