Probabilistic anti-spam filtering with dimensionality reduction

Authors:
Tiago A. Almeida;Akebo Yamakami;Jurandy Almeida
Affiliations:
University of Campinas, Campinas, SP, Brazil;University of Campinas, Campinas, SP, Brazil;University of Campinas, Campinas, SP, Brazil
Venue:
Proceedings of the 2010 ACM Symposium on Applied Computing
Year:
2010

Citing 6
Cited 2

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
Extremely fast text feature extraction for classification and indexing

Proceedings of the 17th ACM conference on Information and knowledge management
Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters

ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications

Facing the spammers: A very effective approach to avoid junk e-mails

Expert Systems with Applications: An International Journal
A hybrid Gini PSO-SVM feature selection based on Taguchi method: an evaluation on email filtering

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the biggest problems of e-mail communication is the massive spam message delivery. Everyday, billion of unwanted messages are sent by spammers and this number does not stop growing. Helpfully, there are different approaches able to automatically detect and remove most of these messages, and a well-known ones are based on Bayesian decision theory. However, many machine learning techniques applied to text categorization have the same difficulty: the high dimensionality of the feature space. Many term selection methods have been proposed in the literature. Nevertheless, it is still unclear how the performance of naive Bayes anti-spam filters depends on the methods applied for reducing the dimensionality of the feature space. In this paper, we compare the performance of most popular methods used as term selection techniques with some variations of the original naive Bayes anti-spam filter.