Minimax-optimal rates for sparse additive models over kernel classes via convex programming

Authors:
Garvesh Raskutti;Martin J. Wainwright;Bin Yu
Affiliations:
Department of Statistics, University of California, Berkeley, CA;Department of Statistics, University of California, Berkeley, CA and Department of Elecrical Engineering & Computer Science;Department of Statistics, University of California, Berkeley, CA and Department of Elecrical Engineering & Computer Science
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 6
Cited 0

Elements of information theory

Elements of information theory
Better subset regression using the nonnegative garrote

Technometrics
Geometric Parameters of Kernel Machines

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Convex Optimization

Convex Optimization
Consistency of the Group Lasso and Multiple Kernel Learning

The Journal of Machine Learning Research
Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse additive models are families of d-variate functions with the additive decomposition f* = Σj∈S fj*, where S is an unknown subset of cardinality s . In this paper, we consider the case where each univariate component function fj* lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2(P) and L2(Pn) norms over the class Fd,s,H of sparse additive models with each univariate function fj* in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2(P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much faster estimation rates are possible for any sparsity s = Ω(√n), showing that global boundedness is a significant restriction in the high-dimensional setting.