Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes

  • Authors:
  • Mingyuan Zhao;Chong Fu;Luping Ji;Ke Tang;Mingtian Zhou

  • Affiliations:
  • School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China;School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China;School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China;School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China;School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.06

Visualization

Abstract

Support vector machines (SVM) are an emerging data classification technique with many diverse applications. The feature subset selection, along with the parameter setting in the SVM training procedure significantly influences the classification accuracy. In this paper, the asymptotic behaviors of support vector machines are fused with genetic algorithm (GA) and the feature chromosomes are generated, which thereby directs the search of genetic algorithm to the straight line of optimal generalization error in the superparameter space. On this basis, a new approach based on genetic algorithm with feature chromosomes, termed GA with feature chromosomes, is proposed to simultaneously optimize the feature subset and the parameters for SVM. To evaluate the proposed approach, the experiment adopts several real world datasets from the UCI database and from the Benchmark database. Compared with the GA without feature chromosomes, the grid search, and other approaches, the proposed approach not only has higher classification accuracy and smaller feature subsets, but also has fewer processing time.