Statistical Properties and Adaptive Tuning of Support Vector Machines

Authors:
Yi Lin;Grace Wahba;Hao Zhang;Yoonkyung Lee
Affiliations:
Department of Statistics, University of Wisconsin, Madison, 1210 West Dayton Street, Madison, WI 53706-1685, USA. yilin@stat.wisc.edu (http://stat.wisc.edu/~yilin);Department of Statistics, University of Wisconsin, Madison, 1210 West Dayton Street, Madison, WI 53706-1685, USA. wahba@stat.wisc.edu (http://stat.wisc.edu/~wahba);Department of Statistics, University of Wisconsin, Madison, 1210 West Dayton Street, Madison, WI 53706-1685, USA. hzhang@stat.wisc.edu (http://stat.wisc.edu/~hzhang);Department of Statistics, University of Wisconsin, Madison, 1210 West Dayton Street, Madison, WI 53706-1685, USA. yklee@stat.wisc.edu (http://stat.wisc.edu/~yklee)
Venue:
Machine Learning
Year:
2002

Citing 8
Cited 5

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
A sparse representation for function approximation

Neural Computation
Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV

Advances in kernel methods
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Support Vector Machines and the Bayes Rule in Classification

Data Mining and Knowledge Discovery
Support Vector Machines for Classification in Nonstandard Situations

Machine Learning

Are loss functions all the same?

Neural Computation
Local linear approximation for kernel methods: the railway kernel

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Distributed Monitoring with Collaborative Prediction

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Coherence functions with applications in large-margin classification methods

The Journal of Machine Learning Research
Efficient distributed monitoring with active Collaborative Prediction

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the statistical aspects of support vector machines (SVMs) in the classification context, and describe an approach to adaptively tuning the smoothing parameter(s) in the SVMs. The relation between the Bayes rule of classification and the SVMs is discussed, shedding light on why the SVMs work well. This relation also reveals that the misclassification rate of the SVMs is closely related to the generalized comparative Kullback-Leibler distance (GCKL) proposed in Wahba (1999, Scholkopf, Burges, & Smola (Eds.), Advances in Kernel Methods—Support Vector Learning. Cambridge, MA: MIT Press). The adaptive tuning is based on the generalized approximate cross validation (GACV), which is an easily computable proxy of the GCKL. The results are generalized to the unbalanced case where the fraction of members of the classes in the training set is different than that in the general population, and the costs of misclassification for the two kinds of errors are different. The main results in this paper have been obtained in several places elsewhere. Here we take the opportunity to organize them in one place and note how they fit together and reinforce one another. Mostly the work of the authors is reviewed.