Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls

Authors:
Garvesh Raskutti;Martin J. Wainwright;Bin Yu
Affiliations:
Department of Statistics, University of California at Berkeley, Berkeley, CA, USA;Department of Statistics and Department of Electrical Engineering and Computer Science, University of California at Berkeley, Berkeley, CA, USA;Department of Statistics and Department of Electrical Engineering and Computer Science, University of California at Berkeley, Berkeley, CA, USA
Venue:
IEEE Transactions on Information Theory
Year:
2011

Citing 0
Cited 5

Minimax-optimal rates for sparse additive models over kernel classes via convex programming

The Journal of Machine Learning Research
Metric entropy and sparse linear approximation of lq-hulls for 0

Journal of Approximation Theory
The geometry of differential privacy: the sparse and approximate cases

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Adaptive and optimal online linear regression on ℓ1-balls

Theoretical Computer Science
Sparse matrix inversion with scaled Lasso

The Journal of Machine Learning Research

Quantified Score

Hi-index	754.84

Visualization

Abstract

Consider the high-dimensional linear regression model $y = X \beta^* + w$, where $y \in {\BBR}^n$ is an observation vector, $X \in {\BBR}^{n \times d}$ is a design matrix with $d n$, ${\beta^*} \in {\BBR}^d$ is an unknown regression vector, and $w \sim {\cal N}(0, \sigma^2 I)$ is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating ${\beta^*}$ in either $\ell_2$-loss and $\ell_2$-prediction loss, assuming that $\beta^*$ belongs to an $\ell_q$ -ball ${\BBB}_q(R_q)$ for some $q \in [0,1]$. It is shown that under suitable regularity conditions on the design matrix $X$, the minimax optimal rate in $\ell_2$ -loss and $\ell_2$-prediction loss scales as $\Theta\left(R_q \left({\log d \over n}\right)^{1-{q \over 2}}\right)$. The analysis in this paper reveals that conditions on the design matrix $X$ enter into the rates for $\ell_2$-error and $\ell_2$ -prediction error in complementary ways in the upper and lower bounds. Our proofs of the lower bounds are information theoretic in nature, based on Fano's inequality and results on the metric entropy of the balls ${\BBB}_q(R_q)$, whereas our proofs of the upper bounds are constructive, involving direct analysis of least squares over $\ell_q$ -balls. For the special case $q=0$, corresponding to models with an exact sparsity constraint, our results show that although computationally efficient $\ell_1$-based methods can achieve the minimax rates up to constant factors, they require slightly stronger assumptions on the design matrix $X$ than optimal algorithms involving least-squares over the $\ell_0$-ball.