E-mail Spam Filtering Using Support Vector Machines with Selection of Kernel Function Parameters
ICICIC '09 Proceedings of the 2009 Fourth International Conference on Innovative Computing, Information and Control
Probabilistic anti-spam filtering with dimensionality reduction
Proceedings of the 2010 ACM Symposium on Applied Computing
A Novel Spam Filtering Framework Based on Fuzzy Adaptive Particle Swarm Optimization
ICICTA '11 Proceedings of the 2011 Fourth International Conference on Intelligent Computation Technology and Automation - Volume 01
A new feature selection algorithm based on binomial hypothesis testing for spam filtering
Knowledge-Based Systems
A two-stage feature selection method for text categorization
Computers & Mathematics with Applications
A Local-Concentration-Based Feature Extraction Approach for Spam Filtering
IEEE Transactions on Information Forensics and Security
A novel probabilistic feature selection method for text classification
Knowledge-Based Systems
Hi-index | 0.00 |
The flooding of spam emails in email server is an arm- race issue. Even until today, filtering spam from email messages has still become as an ongoing work by researchers. Among all of the methods proposed, methods by using machine learning algorithms have achieved more success in spam filtering. Unfortunately in machine learning, a high dimensionality of features space after preprocessing became as a big hurdle for the classifier. Not only high dimensionality issues, the excessive number of features also can degrade the classification results. Thus in this paper, we proposed two stages of feature selection based on Taguchi methods to reduce the high dimensionality of features and obtain a good classification result for spam filtering. Firstly, we apply Gini Index feature selection to reduce the dimension of terms; and then we applied Taguchi method to assist Gini Index and PSO-SVM in selecting the best combination of parameter settings. This method is trained and tested on a Lingspam dataset. The performance of the proposed method is compared with the traditional feature selection and current work by another researcher. The result showed that our proposed method produced a good precision and recall result with the lowest number of features.