Fast classification for large data sets via random selection clustering and Support Vector Machines

  • Authors:
  • Xiaoou Li;Jair Cervantes;Wen Yu

  • Affiliations:
  • Departamento de Computacion, Cinvestav-Ipn, Mexico City, Mexico;Departamento de Computacion, Cinvestav-Ipn, Mexico City, Mexico;Departamento de Control Automatico, Cinvestav-Ipn, Mexico City, Mexico

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Support Vector Machines SVMs are high-accuracy classifiers. However, normal SVM algorithms are unsuitable for classification of large data sets because of their training complexity. In this paper, we propose a novel SVM classification approach for large data sets. We first use the random selection to select a small group of training data for the first-stage SVM. Then a de-clustering technique is proposed to recover the training data for the second-stage SVM. This two-stage SVM classifier has distinctive advantages on dealing with huge data sets such as those in bioinformatics. The performance analysis is also given in this paper. Finally, we apply the proposed method on several benchmark problems. Experimental results demonstrate that this approach has good classification accuracy while the training is significantly faster than other SVM classifiers.