Local minimax learning of functions with best finite sample estimation error bounds: applications to ridge and lasso regression, boosting, tree learning, kernel machines, and inverse problems

  • Authors:
  • Lee K. Jones

  • Affiliations:
  • Department of Mathematical Sciences, University of Massachusetts, Lowell, MA

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2009

Quantified Score

Hi-index 754.84

Visualization

Abstract

Optimal local estimation is formulated in the minimax sense for inverse problems and nonlinear regression. This theory provides best mean squared finite sample error bounds for some popular statistical learning algorithms and also for several optimal improvements of other existing learning algorithms such as smoothing splines and kernel regularization. The bounds and improved algorithms are not based on asymptotics or Bayesian assumptions and are truly local for each query, not depending on cross validating estimates at other queries to optimize modeling parameters. Results are given for optimal local learning of approximately linear functions with side information (context) using real algebraic geometry. In particular, finite sample error bounds are given for ridge regression and for a local version of lasso regression. The new regression methods require only quadratic programming with linear or quadratic inequality constraints for implementation. Greedy additive expansions are then combined with local minimax learning via a change in metric. An optimal strategy is presented for fusing the local minimax estimators of a class of experts--providing optimal finite sample prediction error bounds from (random) forests. Local minimax learning is extended to kernel machines. Best local prediction error bounds for finite samples are given for Tikhonov regularization. The geometry of reproducing kernel Hilbert space is used to derive improved estimators with finite sample mean squared error (MSE) bounds for class membership probability in two class pattern classification problems. A purely local, cross validation free algorithm is proposed which uses Fisher information with these bounds to determine best local kernel shape in vector machine learning. Finally, a locally quadratic solution to the finite Fourier moments problem is presented. After reading the first three sections the reader may proceed directly to any of the subsequent applications sections.