Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Identifying Junk Electronic Mail in Microsoft Outlook with a Support Vector Machine
SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
An empirical study of three machine learning methods for spam filtering
Knowledge-Based Systems
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to this special issue on revisiting and reinventing e-mail
Human-Computer Interaction
An overview of statistical learning theory
IEEE Transactions on Neural Networks
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Effect of feature selection methods on machine learning classifiers for detecting email spams
Proceedings of the 2013 Research in Adaptive and Convergent Systems
An Enhanced Genetic Programming Approach for Detecting Unsolicited Emails
CSE '13 Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering
Hi-index | 0.00 |
Detection of the spam emails within a set of email files has become challenging task for researchers. Identification of an effective classifier is based not only on high accuracy of detection but also on low false alarm rates, and the need to use as few features as possible. In view of these challenges, this research examines the effects of using features selected by four feature subset selection methods (i.e. Genetic, Greedy Stepwise, Best First, and Rank Search) on popular Machine Learning Classifiers like Bayesian, Naive Bayes, Support Vector Machine, Genetic Algorithm, J48 and Random Forest. Tests were performed on three different publicly available spam email datasets: "Enron", "SpamAssassin" and "LingSpam". Results show that, Greedy Stepwise Search method is a good method for feature subset selection for spam email detection. Among the Machine Learning Classifiers, Support Vector Machine has been found to be the best classifier both in terms of accuracy and False Positive rate. However, results of Random Forest were very close to that of Support Vector Machine. The Genetic classifier was identified as a weak classifier.