Effect of feature selection methods on machine learning classifiers for detecting email spams

Authors:
Shrawan Kumar Trivedi;Shubhamoy Dey
Affiliations:
Indian Institute of Management Prabandh Shikhar, Rau Indore, India;Indian Institute of Management Prabandh Shikhar, Rau Indore, India
Venue:
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Year:
2013

Citing 9
Cited 1

Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Identifying Junk Electronic Mail in Microsoft Outlook with a Support Vector Machine

SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
Spam and the ongoing battle for the inbox

Communications of the ACM - Spam and the ongoing battle for the inbox
An empirical study of three machine learning methods for spam filtering

Knowledge-Based Systems
Relaxed online SVMs for spam filtering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to this special issue on revisiting and reinventing e-mail

Human-Computer Interaction
An overview of statistical learning theory

IEEE Transactions on Neural Networks
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails

ACM SIGAPP Applied Computing Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research presents the effects of using features selected by two feature selection methods i.e. Genetic Search and Greedy Stepwise Search on popular Machine Learning Classifiers like Bayesian, Naive Bayes, Support Vector Machine and Genetic Algorithm. Tests were performed on two different publicly available spam email datasets: "Enron" and "SpamAssassin". Results show that, Greedy Stepwise Search is a good method for feature selection for spam email detection. Among the Machine Learning Classifiers, Support Vector Machine has been found to be the best both in terms of accuracy and False Positive rate