A comparison of event models for Naive Bayes anti-spam e-mail filtering

Authors:
Karl-Michael Schneider
Affiliations:
University of Passau, Innstr, Passau
Venue:
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Year:
2003

Citing 7
Cited 37

Elements of information theory

Elements of information theory
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Fighting the spam wars: A remailer approach with restrictive aliasing

ACM Transactions on Internet Technology (TOIT)
"In vivo" spam filtering: a challenge problem for KDD

ACM SIGKDD Explorations Newsletter
Mining Online Deal Forums for Hot Deals

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
Combining email models for false positive reduction

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A suffix tree approach to anti-spam email filtering

Machine Learning
Automatic web pages categorization with ReliefF and Hidden Naive Bayes

Proceedings of the 2007 ACM symposium on Applied computing
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Effective spam filtering: A single-class learning and ensemble approach

Decision Support Systems
Asymmetric support vector machines: low false-positive learning under the user tolerance

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Comparative Impact Study of Attribute Selection Techniques on Naïve Bayes Spam Filters

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Searching for Interacting Features for Spam Filtering

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
The ineffectiveness of within-document term frequency in text classification

Information Retrieval
Anti-spam Filters Based on Support Vector Machines

ISICA '09 Proceedings of the 4th International Symposium on Advances in Computation and Intelligence
Commercial Internet filters: Perils and opportunities

Decision Support Systems
ProMail: using progressive email social network for spam detection

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Frequent variable sets based clustering for artificial neural networks particle classification

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Meta learning intrusion detection in real time network

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
An anti-spam filtering system based on the naive Bayesian classifier and distributed checksum clearinghouse

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
A neural tree and its application to spam e-mail detection

Expert Systems with Applications: An International Journal
Using GMDH-based networks for improved spam detection and email feature analysis

Applied Soft Computing
Application of genetic optimized artificial immune system and neural networks in spam detection

Applied Soft Computing
Enhanced Topic-based Vector Space Model for semantics-aware spam filtering

Expert Systems with Applications: An International Journal
Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
An effective spam filter based on a combined support vector machine approach

International Journal of Internet Technology and Secured Transactions
NASC: a novel approach for spam classification

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Text categorization using SVMs with rocchio ensemble for internet information classification

ICCNMC'05 Proceedings of the Third international conference on Networking and Mobile Computing
Generating estimates of classification confidence for a case-based spam filter

ICCBR'05 Proceedings of the 6th international conference on Case-Based Reasoning Research and Development
Word sense disambiguation for spam filtering

Electronic Commerce Research and Applications
Using probabilistic generative models for ranking risks of Android apps

Proceedings of the 2012 ACM conference on Computer and communications security
Developing methods and heuristics with low time complexities for filtering spam messages

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Reversing the effects of tokenisation attacks against content-based spam filters

International Journal of Security and Networks
Using naive bayes to detect spammy names in social networks

Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Genetic optimized artificial immune system in spam detection: a review and a model

Artificial Intelligence Review
Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe experiments with a Naive Bayes text classifier in the context of anti-spam E-mail filtering, using two different statistical event models: a multi-variate Bernoulli model and a multinomial model. We introduce a family of feature ranking functions for feature selection in the multinomial event model that take account of the word frequency information. We present evaluation results on two publicly available corpora of legitimate and spam E-mails. We find that the multinomial model is less biased towards one class and achieves slightly higher accuracy than the multi-variate Bernoulli model.