Support Vector Machines with Clustering for Training with Very Large Datasets

Authors:
Theodoros Evgeniou;Massimiliano Pontil
Affiliations:
-;-
Venue:
SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Year:
2002

Citing 5
Cited 1

Original Contribution: On the training of radial basis function classifiers

Neural Networks
Support-Vector Networks

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A Column Generation Algorithm For Boosting

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

A Confident Majority Voting Strategy for Parallel and Modular Support Vector Machines

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for training Support Vector Machines (SVM) classifiers with very large datasets. We present a clustering algorithm that can be used to preprocess standard training data and show how SVM can be simply extended to deal with clustered data, that is effectively training with a set of weighted examples. The algorithm computes large clusters for points which are far from the decision boundary and small clusters for points near the boundary. This implies that when SVMs are trained on the preprocessed clustered data set nearly the same decision boundary is found but the computational time decreases significantly. When the input dimensionality of the data is not large, for example of the order of ten, the clustering algorithm can significantly decrease the effective number of training examples, which is a useful feature for training SVM on large data sets. Preliminary experimental results indicate the benefits of our approach.