SVM classifier incorporating feature selection using GA for spam detection

  • Authors:
  • Huai-bin Wang;Ying Yu;Zhen Liu

  • Affiliations:
  • Dept. of computer science, Tianjin University of Technology, Tianjin, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;Nagasaki Institute of Applied Science, Japan, Nagasaki, Japan

  • Venue:
  • EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

The use of SVM (Support Vector Machines) in detecting e-mail as spam or nonspam by incorporating feature selection using GA (Genetic Algorithm) is investigated. An GA approach is adopted to select features that are most favorable to SVM classifier, which is named as GA-SVM. Scaling factor is exploited to measure the relevant coefficients of feature to the classification task and is estimated by GA. Heavy-bias operator is introduced in GA to promote sparse in the scaling factors of features. So, feature selection is performed by eliminating irrelevant features whose scaling factor is zero. The experiment results on UCI Spam database show that comparing with original SVM classifier, the number of support vector decreases while better classification results are achieved based on GA-SVM.