Entropy Numbers, Operators and Support Vector Kernels

Authors:
Robert C. Williamson;Alex J. Smola;Bernhard Schölkopf
Affiliations:
-;-;-
Venue:
EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Year:
1999

Citing 9
Cited 1

Geometric and probabilistic estimates for entropy and approximation numbers of operators

Journal of Approximation Theory
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Scale-sensitive dimensions, uniform convergence, and learnability

Journal of the ACM (JACM)
The connection between regularization operators and support vector kernels

Neural Networks
A Note on a Scale-Sensitive Dimension of Linear Bounded Functionals in Banach Spaces

ALT '97 Proceedings of the 8th International Conference on Algorithmic Learning Theory
Generalization Performance of Classifiers in Terms of Observed Covering Numbers

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Structural risk minimization over data-dependent hierarchies

IEEE Transactions on Information Theory

Generalization Performance of Classifiers in Terms of Observed Covering Numbers

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite dimensional unit ball in feature space into a finite dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence we are able to theoretically explain the effect of the choice of kernel functions on the generalization performance of support vector machines.