Minimax rates of convergence for high-dimensional regression under lq-ball sparsity

Authors:
Garvesh Raskutti;Martin J. Wainwright;Bin Yu
Affiliations:
Department of Statistics, UC Berkeley, Berkeley, CA;Department of Statistics and Department of EECS, UC Berkeley, Berkeley, CA;Department of Statistics and Department of EECS, UC Berkeley, Berkeley, CA
Venue:
Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Year:
2009

Citing 9
Cited 1

Elements of information theory

Elements of information theory
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
A lower estimate for entropy numbers

Journal of Approximation Theory
Lectures on Discrete Geometry

Lectures on Discrete Geometry
Eigenvalues of large sample covariance matrices of spiked population models

Journal of Multivariate Analysis
On Model Selection Consistency of Lasso

The Journal of Machine Learning Research
Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso)

IEEE Transactions on Information Theory
Decoding by linear programming

IEEE Transactions on Information Theory
Stable recovery of sparse overcomplete representations in the presence of noise

IEEE Transactions on Information Theory

Adaptive and optimal online linear regression on l1-balls

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider the standard linear regression model y = Xβ* + w, where y ∈ Rn is an observation vector, X ∈ Rn×d is a measurement matrix, β* ∈ Rd is the unknown regression vector, and w ∼ N(0, σ2I) is additive Gaussian noise. This paper determines sharp minimax rates of convergence for estimation of β* in l2 norm, assuming that β* belongs to a weak lq-ball Bq(Rq) for some q ∈ [0, 1]. We show that under suitable regularity conditions on the design matrix X, the minimax error in squared l2-norm scales as Rq (log d/n)1-q/2. In addition, we provide lower bounds on rates of convergence for general lp norm (for all p ∈ [1, + ∞], p ≠ q). Our proofs of the lower bounds are information-theoretic in nature, based on Fano's inequality and results on the metric entropy of the balls Bq(Rq). Matching upper bounds are derived by direct analysis of the solution to an optimization algorithm over Bq(Rq). We prove that the conditions on X required by optimal algorithms are satisfied with high probability by broad classes of noni.i.d. Gaussian random matrices, for which RIP or other sparse eigenvalue conditions are violated. For q = 0, l1- based methods (Lasso and Dantzig selector) achieve the minimax optimal rates in l2 error, but require stronger regularity conditions on the design than the nonconvex optimization algorithm used to determine the minimax upper bounds.