Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block $ell _{1}/ell _{infty} $-Regularization

  • Authors:
  • S. N. Negahban;M. J. Wainwright

  • Affiliations:
  • Dept. of Electr. Eng. & Comput. Sci., Univ. of California, Berkeley, CA, USA;-

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2011

Quantified Score

Hi-index 754.84

Visualization

Abstract

Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports of size at most s. This set-up suggests the use of ℓ1/ℓ∞-regularized regression for joint estimation of the p×r matrix of regression coefficients. We analyze the high-dimensional scaling of ℓ1/ℓ∞-regularized quadratic programming, considering both consistency rates in ℓ∞-norm, and how the minimal sample size n required for consistent variable selection scales with model dimension, sparsity, and overlap between the supports. We first establish bounds on the ℓ∞-error as well sufficient conditions for exact variable selection for fixed design matrices, as well as for designs drawn randomly from general Gaussian distributions. Specializing to the case r = 2 linear regression problems with standard Gaussian designs whose supports overlap in a fraction α ∈ [0,1] of their entries, we prove that ℓ1/ℓ∞-regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞(n, p, s, α) = n/{(4 - 3 α) s log(p-(2- α) s)}. An implication is that the use of ℓ1/ℓ∞-regularization yields improved statistical efficiency if the overlap parameter is large enough ( α >; 2/3), but has worse statistical efficiency than a naive Lasso-based approach for moderate to small overlap (α <; 2/3 ). Empirical simulations illustrate the close agreement between theory and actual behavior in practice. These results show that caution must be exercised in applying ℓ1/ℓ∞ block regularization: if the data does not match its structure very closely, it can impair statistical performance relative to computationally less expensive schemes.