An evaluation of Naive Bayes variants in content-based learning for spam filtering

Authors:
Alexander K. Seewald
Affiliations:
Seewald Solutions, A-1180 Vienna, Austria. Tel.: +43(664) 110 68 86/ Fax: +43(1) 2533033 2764/ E-mail: alex@seewald.at
Venue:
Intelligent Data Analysis
Year:
2007

Citing 4
Cited 9

Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

Information Retrieval
Combining text and heuristics for cost-sensitive spam filtering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
An investigation of real-valued accuracy-based learning classifier systems for electronic fraud detection

Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Enhanced email spam filtering through combining similarity graphs

Proceedings of the fourth ACM international conference on Web search and data mining
Enhanced Topic-based Vector Space Model for semantics-aware spam filtering

Expert Systems with Applications: An International Journal
Word sense disambiguation for spam filtering

Electronic Commerce Research and Applications
Impact of spam exposure on user engagement

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Reversing the effects of tokenisation attacks against content-based spam filters

International Journal of Security and Networks
Using naive bayes to detect spammy names in social networks

Proceedings of the 2013 ACM workshop on Artificial intelligence and security

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two extended variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two extended variants of Naive Bayes learning, SA-Train and CRM114, were superior to simple Naive Bayes learning, represented by SpamBayes. Surprisingly, we found that the performance of these systems was remarkably similar and that the extended systems have significant weaknesses which are not apparent for the simpler Naive Bayes learner. The simpler Naive Bayes learner, SpamBayes, also offers the most stable performance in that it deteriorates least over time. Overall, SpamBayes should be preferred over the more complex variants.