Fast classification for large data sets via random selection clustering and Support Vector Machines

Authors:
Xiaoou Li;Jair Cervantes;Wen Yu
Affiliations:
Departamento de Computacion, Cinvestav-Ipn, Mexico City, Mexico;Departamento de Computacion, Cinvestav-Ipn, Mexico City, Mexico;Departamento de Control Automatico, Cinvestav-Ipn, Mexico City, Mexico
Venue:
Intelligent Data Analysis
Year:
2012

Citing 34
Cited 0

A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm

Pattern Recognition Letters
Discriminant Adaptive Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
P-AutoClass: Scalable Parallel Clustering for Mining Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate minimum enclosing balls in high dimensions using core-sets

Journal of Experimental Algorithmics (JEA)
An Effective Support Vector Machines (SVMs) Performance Using Hierarchical Clustering

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Fast SVM Training Algorithm with Decomposition on Very Large Data Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Predicting the in vivo signature of human gene regulatory sequences

Bioinformatics
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
A Bayesian Committee Machine

Neural Computation
Clustering based large margin classification: a scalable approach using SOCP formulation

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Random Sampling for Continuous Streams with Arbitrary Updates

IEEE Transactions on Knowledge and Data Engineering
A new intrusion detection system using support vector machines and hierarchical clustering

The VLDB Journal — The International Journal on Very Large Data Bases
Support vector machine classification for large data sets via minimum enclosing ball clustering

Neurocomputing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Selecting training points for one-class support vector machines

Pattern Recognition Letters
GP ensembles for large-scale data classification

IEEE Transactions on Evolutionary Computation
Reducing SVM classification time using multiple mirror classifiers

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A one-layer recurrent neural network for support vector machine learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Enhancing prototype reduction schemes with recursion: a method applicable for "large" data sets

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Support-vector-based fuzzy neural network for pattern classification

IEEE Transactions on Fuzzy Systems
On cluster validity for the fuzzy c-means model

IEEE Transactions on Fuzzy Systems
Successive overrelaxation for support vector machines

IEEE Transactions on Neural Networks
Mercer kernel-based clustering in feature space

IEEE Transactions on Neural Networks
Survey of clustering algorithms

IEEE Transactions on Neural Networks
Fast Modular network implementation for support vector machines

IEEE Transactions on Neural Networks
A geometric approach to Support Vector Machine (SVM) classification

IEEE Transactions on Neural Networks
A study on SMO-type decomposition methods for support vector machines

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support Vector Machines SVMs are high-accuracy classifiers. However, normal SVM algorithms are unsuitable for classification of large data sets because of their training complexity. In this paper, we propose a novel SVM classification approach for large data sets. We first use the random selection to select a small group of training data for the first-stage SVM. Then a de-clustering technique is proposed to recover the training data for the second-stage SVM. This two-stage SVM classifier has distinctive advantages on dealing with huge data sets such as those in bioinformatics. The performance analysis is also given in this paper. Finally, we apply the proposed method on several benchmark problems. Experimental results demonstrate that this approach has good classification accuracy while the training is significantly faster than other SVM classifiers.