Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters

Authors:
Tiago A. Almeida;Akebo Yamakami;Jurandy Almeida
Affiliations:
-;-;-
Venue:
ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
Year:
2009

Citing 0
Cited 5

Probabilistic anti-spam filtering with dimensionality reduction

Proceedings of the 2010 ACM Symposium on Applied Computing
Filtering spams using the minimum description length principle

Proceedings of the 2010 ACM Symposium on Applied Computing
Contributions to the study of SMS spam filtering: new collection and results

Proceedings of the 11th ACM symposium on Document engineering
Detection of near-duplicate user generated contents: the SMS spam collection

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Facing the spammers: A very effective approach to avoid junk e-mails

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are different approaches able to automatically detect e-mail spam messages, and the best-known ones are based on Bayesian decision theory. However, the most of these approaches have the same difficulty: the high dimensionality of the feature space. Many term selection methods have been proposed in the literature. Nevertheless, it is still unclear how the performance of naive Bayes anti-spam filters depends on the methods applied for reducing the dimensionality of the feature space. In this paper, we compare the performance of most popular methods used as term selection techniques, such as document frequency, information gain, mutual information, X 2 statistic, and odds ratio used for reducing the dimensionality of the term space with four well-known different versions of naive Bayes spam filter.