Selecting valuable training samples for SVMs via data structure analysis

  • Authors:
  • Defeng Wang;Lin Shi

  • Affiliations:
  • Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

  • Venue:
  • Neurocomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

In spite of its salient properties and wide acceptance, support vector machines (SVMs) still face difficulties in scalability, because solving the quadratic programming (QP) problems in SVMs training is especially costly when dealing with large sets of training data. This paper presents a new algorithm named sample reduction by data structure analysis (SR-DSA) for SVMs to improve their scalability. The SR-DSA utilizes data structure information in selecting the data points valuable in learning the separating plane. As this method is performed completely before SVMs training, it avoids the problem suffered by most sample reduction methods that choose samples heavily depending on repeated training of SVMs. Experiments on both synthetic and real world datasets show that the SR-DSA is capable of reducing the number of samples as well as the time for SVMs training while maintaining high testing accuracy.