Support vector machines training data selection using a genetic algorithm

  • Authors:
  • Michal Kawulok;Jakub Nalepa

  • Affiliations:
  • Institute of Informatics, Silesian University of Technology, Gliwice, Poland;Institute of Informatics, Silesian University of Technology, Gliwice, Poland

  • Venue:
  • SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new method for selecting valuable training data for support vector machines (SVM) from large, noisy sets using a genetic algorithm (GA). SVM training data selection is a known, however not extensively investigated problem. The existing methods rely mainly on analyzing the geometric properties of the data or adapt a randomized selection, and to the best of our knowledge, GA-based approaches have not been applied for this purpose yet. Our work was inspired by the problems encountered when using SVM for skin segmentation. Due to a very large set size, the existing methods are too time-consuming, and random selection is not effective because of the set noisiness. In the work reported here we demonstrate how a GA can be used to optimize the training set, and we present extensive experimental results which confirm that the new method is highly effective for real-world data.