Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A vector space model for automatic indexing
Communications of the ACM
CNSR '04 Proceedings of the Second Annual Conference on Communication Networks and Services Research
Personalized Spam Filtering with Semi-supervised Classifier Ensemble
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
An evaluation of Naive Bayes variants in content-based learning for spam filtering
Intelligent Data Analysis
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
By feeding personal e-mails into the training set, personalized content-based spam filters are believed to classify e-mails in higher accuracy. However, filters trained by both spam mails and personal mails may have difficulty classifying e-mails with the same characteristics of both spam and ham. In this paper, we propose a two-tier approach of using two filters trained only with either personal mails or spam mails. E-mails classified as legitimate mails by the legitimate mail filter may pass, while the remaining e-mails are processed by the spam filter in an ordinary way. Experiments in this paper are performed on two mail servers–one equipped with ordinary spam filter, and the other equipped both the legitimate mail filter and the spam filter. By combining the two filters with tuned thresholds, a much lower false positive rate is observed under the same false negative rate comparing to the ordinary filter.