Selecting valuable training samples for SVMs via data structure analysis

Authors:
Defeng Wang;Lin Shi
Affiliations:
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
Venue:
Neurocomputing
Year:
2008

Citing 16
Cited 4

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Fast hierarchical clustering and other applications of dynamic closest pairs

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Soft Margins for AdaBoost

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning large margin classifiers locally and globally

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Nonlinear kernel-based statistical pattern analysis

IEEE Transactions on Neural Networks
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Selecting training points for one-class support vector machines

Pattern Recognition Letters
DEA based data preprocessing for maximum decisional efficiency linear case valuation models

Expert Systems with Applications: An International Journal
Support vector machines training data selection using a genetic algorithm

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Inductive manifold learning using structured support vector machine

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

In spite of its salient properties and wide acceptance, support vector machines (SVMs) still face difficulties in scalability, because solving the quadratic programming (QP) problems in SVMs training is especially costly when dealing with large sets of training data. This paper presents a new algorithm named sample reduction by data structure analysis (SR-DSA) for SVMs to improve their scalability. The SR-DSA utilizes data structure information in selecting the data points valuable in learning the separating plane. As this method is performed completely before SVMs training, it avoids the problem suffered by most sample reduction methods that choose samples heavily depending on repeated training of SVMs. Experiments on both synthetic and real world datasets show that the SR-DSA is capable of reducing the number of samples as well as the time for SVMs training while maintaining high testing accuracy.