Learning Bounds for Kernel Regression Using Effective Data Dimensionality

Authors:
Tong Zhang
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.
Venue:
Neural Computation
Year:
2005

Citing 6
Cited 4

Localized Rademacher Complexities

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Leave-one-out bounds for kernel methods

Neural Computation
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
On the performance of kernel classes

The Journal of Machine Learning Research
Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators

IEEE Transactions on Information Theory
Covering numbers for support vector machines

IEEE Transactions on Information Theory

Dimensionality reduction and generalization

Proceedings of the 24th international conference on Machine learning
On Relevant Dimensions in Kernel Feature Spaces

The Journal of Machine Learning Research
Multi-view regression via canonical correlation analysis

COLT'07 Proceedings of the 20th annual conference on Learning theory
Multi-task regression using minimal penalties

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Kernel methods can embed finite-dimensional data into infinite-dimensional feature spaces. In spite of the large underlying feature dimensionality, kernel methods can achieve good generalization ability. This observation is often wrongly interpreted, and it has been used to argue that kernel learning can magically avoid the "curse-of-dimensionality" phenomenon encountered in statistical estimation problems. This letter shows that although using kernel representation, one can embed data into an infinite-dimensional feature space; the effective dimensionality of this embedding, which determines the learning complexity of the underlying kernel machine, is usually small. In particular, we introduce an algebraic definition of a scale-sensitive effective dimension associated with a kernel representation. Based on this quantity, we derive upper bounds on the generalization performance of some kernel regression methods. Moreover, we show that the resulting convergent rates are optimal under various circumstances.