Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block $ell _{1}/ell _{infty} $-Regularization

Authors:
S. N. Negahban;M. J. Wainwright
Affiliations:
Dept. of Electr. Eng. & Comput. Sci., Univ. of California, Berkeley, CA, USA;-
Venue:
IEEE Transactions on Information Theory
Year:
2011

Citing 0
Cited 2

Error bounds for convex parameter estimation

Signal Processing
Scaling multiple-source entity resolution using statistically efficient transfer learning

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	754.84

Visualization

Abstract

Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports of size at most s. This set-up suggests the use of ℓ1/ℓ∞-regularized regression for joint estimation of the p×r matrix of regression coefficients. We analyze the high-dimensional scaling of ℓ1/ℓ∞-regularized quadratic programming, considering both consistency rates in ℓ∞-norm, and how the minimal sample size n required for consistent variable selection scales with model dimension, sparsity, and overlap between the supports. We first establish bounds on the ℓ∞-error as well sufficient conditions for exact variable selection for fixed design matrices, as well as for designs drawn randomly from general Gaussian distributions. Specializing to the case r = 2 linear regression problems with standard Gaussian designs whose supports overlap in a fraction α ∈ [0,1] of their entries, we prove that ℓ1/ℓ∞-regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞(n, p, s, α) = n/{(4 - 3 α) s log(p-(2- α) s)}. An implication is that the use of ℓ1/ℓ∞-regularization yields improved statistical efficiency if the overlap parameter is large enough ( α >; 2/3), but has worse statistical efficiency than a naive Lasso-based approach for moderate to small overlap (α <; 2/3 ). Empirical simulations illustrate the close agreement between theory and actual behavior in practice. These results show that caution must be exercised in applying ℓ1/ℓ∞ block regularization: if the data does not match its structure very closely, it can impair statistical performance relative to computationally less expensive schemes.