Training data selection for support vector machines

  • Authors:
  • Jigang Wang;Predrag Neskovic;Leon N. Cooper

  • Affiliations:
  • Institute for Brain and Neural Systems, Physics Department, Brown University, Providence, RI;Institute for Brain and Neural Systems, Physics Department, Brown University, Providence, RI;Institute for Brain and Neural Systems, Physics Department, Brown University, Providence, RI

  • Venue:
  • ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, support vector machines (SVMs) have become a popular tool for pattern recognition and machine learning. Training a SVM involves solving a constrained quadratic programming problem, which requires large memory and enormous amounts of training time for large-scale problems. In contrast, the SVM decision function is fully determined by a small subset of the training data, called support vectors. Therefore, it is desirable to remove from the training set the data that is irrelevant to the final decision function. In this paper we propose two new methods that select a subset of data for SVM training. Using real-world datasets, we compare the effectiveness of the proposed data selection strategies in terms of their ability to reduce the training set size while maintaining the generalization performance of the resulting SVM classifiers. Our experimental results show that a significant amount of training data can be removed by our proposed methods without degrading the performance of the resulting SVM classifiers.