Spam filters: bayes vs. chi-squared; letters vs. words

Authors:
Cormac O'Brien;Carl Vogel
Affiliations:
Trinity College, University of Dublin;Trinity College, University of Dublin
Venue:
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Year:
2003

Citing 2
Cited 18

Evaluating cost-sensitive Unsolicited Bulk Email categorization

Proceedings of the 2002 ACM symposium on Applied computing
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

SF-HME system: a hierarchical mixtures-of-experts classification system for spam filtering

Proceedings of the 2006 ACM symposium on Applied computing
Catching spam before it arrives: domain specific dynamic blacklists

ACSW Frontiers '06 Proceedings of the 2006 Australasian workshops on Grid computing and e-research - Volume 54
Using word similarity to eradicate junk emails

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Collaborative spam filtering with heterogeneous agents

Expert Systems with Applications: An International Journal
Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security
Computational Stylometry: Who's in a Play?

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Using the self organizing map for clustering of text documents

Expert Systems with Applications: An International Journal
Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification

Computer Networks: The International Journal of Computer and Telecommunications Networking
Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security - Best papers of the Sec Track at the 2006 ACM Symposium
A survey of learning-based techniques of email spam filtering

Artificial Intelligence Review
Application of genetic optimized artificial immune system and neural networks in spam detection

Applied Soft Computing
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
An immunological filter for spam

ICARIS'06 Proceedings of the 5th international conference on Artificial Immune Systems
Automatic folder allocation system using Bayesian-support vector machines hybrid classification approach

Applied Intelligence
A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

Expert Systems with Applications: An International Journal
An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Applied Intelligence
Genetic optimized artificial immune system in spam detection: a review and a model

Artificial Intelligence Review
A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.01

Visualization

Abstract

We compare two statistical methods for identifying spam or junk electronic mail. Spam filters are classifiers which determine whether an email is junk or not. The proliferation of spam email has made electronic filtering vitally important. The magnitude of the problem is discussed. We examine the Naive Bayesian method in relation to the 'Chi by degrees of Freedom' approach, the latter used in the field of authorship identification. Both methods produce very promising results. However, the 'Chi by degrees of Freedom' has the advantage of providing significance measures, which will help to reduce false positives. Statistics based on character-level tokenization proves more effective than word-level.