Regularizers for structured sparsity

  • Authors:
  • Charles A. Micchelli;Jean M. Morales;Massimiliano Pontil

  • Affiliations:
  • Department of Mathematics, City University of Hong Kong, Hong Kong, People's Republic of China and Department of Mathematics and Statistics, State University of New York, The University at Albany, ...;Department of Computer Science, University College London, England, UK WC1E;Department of Computer Science, University College London, England, UK WC1E

  • Venue:
  • Advances in Computational Mathematics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be "relaxed" by regularizing the squared error with a convex penalty function like the 驴1 norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the 驴1 norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.