Local minimax learning of functions with best finite sample estimation error bounds: applications to ridge and lasso regression, boosting, tree learning, kernel machines, and inverse problems

Authors:
Lee K. Jones
Affiliations:
Department of Mathematical Sciences, University of Massachusetts, Lowell, MA
Venue:
IEEE Transactions on Information Theory
Year:
2009

Citing 9
Cited 0

Rate of approximation results motivated by robust neural network learning

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Local algorithms for pattern recognition and dependencies estimation

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Kriging by local polynomials

Computational Statistics & Data Analysis
Random Forests

Machine Learning
Everything old is new again: a fresh look at historical approaches in machine learning

Everything old is new again: a fresh look at historical approaches in machine learning
Efficient agnostic learning of neural networks with bounded fan-in

IEEE Transactions on Information Theory - Part 2
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	754.84

Visualization

Abstract

Optimal local estimation is formulated in the minimax sense for inverse problems and nonlinear regression. This theory provides best mean squared finite sample error bounds for some popular statistical learning algorithms and also for several optimal improvements of other existing learning algorithms such as smoothing splines and kernel regularization. The bounds and improved algorithms are not based on asymptotics or Bayesian assumptions and are truly local for each query, not depending on cross validating estimates at other queries to optimize modeling parameters. Results are given for optimal local learning of approximately linear functions with side information (context) using real algebraic geometry. In particular, finite sample error bounds are given for ridge regression and for a local version of lasso regression. The new regression methods require only quadratic programming with linear or quadratic inequality constraints for implementation. Greedy additive expansions are then combined with local minimax learning via a change in metric. An optimal strategy is presented for fusing the local minimax estimators of a class of experts--providing optimal finite sample prediction error bounds from (random) forests. Local minimax learning is extended to kernel machines. Best local prediction error bounds for finite samples are given for Tikhonov regularization. The geometry of reproducing kernel Hilbert space is used to derive improved estimators with finite sample mean squared error (MSE) bounds for class membership probability in two class pattern classification problems. A purely local, cross validation free algorithm is proposed which uses Fisher information with these bounds to determine best local kernel shape in vector machine learning. Finally, a locally quadratic solution to the finite Fourier moments problem is presented. After reading the first three sections the reader may proceed directly to any of the subsequent applications sections.