A fast SVM training method for very large datasets

  • Authors:
  • Boyang Li;Qiangwei Wang;Jinglu Hu

  • Affiliations:
  • Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, Fukuoka-ken, Japan;Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, Fukuoka-ken, Japan;Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, Fukuoka-ken, Japan

  • Venue:
  • IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs, In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.