A hybrid Gini PSO-SVM feature selection based on Taguchi method: an evaluation on email filtering

Authors:
Noormadinah Allias;Megat Norulazmi Megat;Mohamed Noor;Mohd. Nazri Ismail
Affiliations:
Universiti Kuala Lumpur, Kuala Lumpur, Malaysia;Universiti Kuala Lumpur, Kuala Lumpur, Malaysia;Universiti Kuala Lumpur, Kuala Lumpur, Malaysia;National Defence University of Malaysia, Kem Perdana Sungai Besi, Kuala Lumpur
Venue:
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Year:
2014

Citing 7
Cited 0

E-mail Spam Filtering Using Support Vector Machines with Selection of Kernel Function Parameters

ICICIC '09 Proceedings of the 2009 Fourth International Conference on Innovative Computing, Information and Control
Probabilistic anti-spam filtering with dimensionality reduction

Proceedings of the 2010 ACM Symposium on Applied Computing
A Novel Spam Filtering Framework Based on Fuzzy Adaptive Particle Swarm Optimization

ICICTA '11 Proceedings of the 2011 Fourth International Conference on Intelligent Computation Technology and Automation - Volume 01
A new feature selection algorithm based on binomial hypothesis testing for spam filtering

Knowledge-Based Systems
A two-stage feature selection method for text categorization

Computers & Mathematics with Applications
A Local-Concentration-Based Feature Extraction Approach for Spam Filtering

IEEE Transactions on Information Forensics and Security
A novel probabilistic feature selection method for text classification

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The flooding of spam emails in email server is an arm- race issue. Even until today, filtering spam from email messages has still become as an ongoing work by researchers. Among all of the methods proposed, methods by using machine learning algorithms have achieved more success in spam filtering. Unfortunately in machine learning, a high dimensionality of features space after preprocessing became as a big hurdle for the classifier. Not only high dimensionality issues, the excessive number of features also can degrade the classification results. Thus in this paper, we proposed two stages of feature selection based on Taguchi methods to reduce the high dimensionality of features and obtain a good classification result for spam filtering. Firstly, we apply Gini Index feature selection to reduce the dimension of terms; and then we applied Taguchi method to assist Gini Index and PSO-SVM in selecting the best combination of parameter settings. This method is trained and tested on a Lingspam dataset. The performance of the proposed method is compared with the traditional feature selection and current work by another researcher. The result showed that our proposed method produced a good precision and recall result with the lowest number of features.