Awarded Best Paper! - Scalable Centralized Bayesian Spam Mitigation with Bogofilter

Authors:
Jeremy Blosser;David Josephsen
Affiliations:
VHA Inc.;VHA Inc.
Venue:
LISA '04 Proceedings of the 18th USENIX conference on System administration
Year:
2004

Citing 0
Cited 8

An effective defense against email spam laundering

Proceedings of the 13th ACM conference on Computer and communications security
Thwarting E-mail Spam Laundering

ACM Transactions on Information and System Security (TISSEC)
Measurement and classification of humans and bots in internet chat

SS'08 Proceedings of the 17th conference on Security symposium
Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Detection of spam hosts and spam bots using network flow traffic modeling

LEET'10 Proceedings of the 3rd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Multimodal social intelligence in a real-time dashboard system

The VLDB Journal — The International Journal on Very Large Data Bases
Humans and bots in internet chat: measurement, analysis, and automated classification

IEEE/ACM Transactions on Networking (TON)
SpaDeS: Detecting spammers at the source network

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bayesian content filters gained popular acclaim when they were put forward in 2002 by Paul Graham as a potential long-term solution for the spam problem. They have since fallen from the limelight, however, due to perceived attack vulnerabilities inherent to all content-based filters as well as real and imagined vulnerabilities specific to Bayesian filters. It has also been assumed that Bayesian filters would be problematic to implement in centralized or large environments due to wordlist management issues. This paper revisits the effectiveness of Bayesian filters as a sustainable singular spam solution for mid- to large-sized environments through a real-world study of the deployment and operation of the Bogofilter Robinson-Fisher Bayesian classification utility in a production mail environment servicing thousands of accounts. Our implementation strategy and methodology as well as our results are described in detail so that they can be evaluated and replicated if desired. Other filtering methodologies which were previously implemented in this environment are also discussed for comparison purposes, though they have since been removed from production due primarily to lack of need. Bayesian classification has been able to solve the spam problem for this user population for the present and observable future, with a single wordlist, and with no secondary spam filtering techniques employed. Significantly, only two business-related legitimate messages have been reported as blocked due to filter misclassification since Bogofilter was deployed.