Adapting Bayesian statistical spam filters to the server side

  • Authors:
  • Sara Sinclair

  • Affiliations:
  • Wellesley College

  • Venue:
  • Journal of Computing Sciences in Colleges
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

Bayesian spam filters, which take their name from Bayes' rule for combining conditional probabilities, have become very popular since the publication of Paul Graham's "A Plan for Spam" (http://www.paulgraham.com) in August of 2002. These statistics-based filters use machine learning to gather data about the tokens (words or strings that have been deemed significant) contained in a user's email. They then use this information to analyze the content of a new message to guess the probability that it is spam. Machine learning allows Bayesian filters to adapt to new trends in spam, and also makes them hard for spammers to fool. These features combined with the high accuracy and low false positive rate make them attractive to anyone interested in reducing the amount of spam in their inbox.