On Convergence of Kernel Learning Estimators

Authors:
Vladimir I. Norkin;Michiel A. Keyzer
Affiliations:
norkin@i.com.ua;m.a.keyzer@sow.vu.nl
Venue:
SIAM Journal on Optimization
Year:
2009

Citing 9
Cited 0

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Stability and generalization

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Model Selection for Regularized Least-Squares Algorithm in Learning Theory

Foundations of Computational Mathematics
Online Learning Algorithms

Foundations of Computational Mathematics
Learning low-rank kernel matrices

ICML '06 Proceedings of the 23rd international conference on Machine learning
Nonparametric Quantile Estimation

The Journal of Machine Learning Research
Optimal Rates for the Regularized Least-Squares Algorithm

Foundations of Computational Mathematics
Consistency of kernel-based quantile regression

Applied Stochastic Models in Business and Industry

Quantified Score

Hi-index	0.01

Visualization

Abstract

The paper studies convex stochastic optimization problems in a reproducing kernel Hilbert space (RKHS). The objective (risk) functional depends on functions from this RKHS and takes the form of a mathematical expectation (integral) of a nonnegative integrand (loss function) over a probability measure. The problem is generally ill-posed, a difficulty that in statistical learning is addressed through Tihonov regularization, with Monte Carlo approximation of integrals, which also makes it possible to solve the problem by finite dimensional (convex) quadratic optimization. The approximate solutions are referred to as kernel learning estimators and are expressed as a linear combination of kernels evaluated at the sample points. They are functional random variables that depend on the full sample. The paper studies a probabilistic convergence of these approximate solutions under a gradual elimination of the regularization parameter with rising number of observations. Its intended contribution is to derive novel nonasymptotic bounds on the minimization error and exponential bounds on the tail distribution of errors and to establish novel sufficient conditions for uniform convergence of kernel estimators to the true (normal) solution with probability one, jointly with a rule for downward adjustment of the regularization factor with increasing sample size. Applications to least squares, median, and quantile regression estimation, as well as to binary classification, are discussed.