Regularizers for structured sparsity

Authors:
Charles A. Micchelli;Jean M. Morales;Massimiliano Pontil
Affiliations:
Department of Mathematics, City University of Hong Kong, Hong Kong, People's Republic of China and Department of Mathematics and Statistics, State University of New York, The University at Albany, ...;Department of Computer Science, University College London, England, UK WC1E;Department of Computer Science, University College London, England, UK WC1E
Venue:
Advances in Computational Mathematics
Year:
2013

Citing 11
Cited 1

Convex Optimization

Convex Optimization
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Multiple kernel learning, conic duality, and the SMO algorithm

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature space perspectives for learning the kernel

Machine Learning
Universal Multi-Task Kernels

The Journal of Machine Learning Research
Convex multi-task feature learning

Machine Learning
Learning with structured sparsity

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Group lasso with overlap and graph lasso

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
On Spectral Learning

The Journal of Machine Learning Research
Solving structured sparsity regularization with proximal methods

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Structured Variable Selection with Sparsity-Inducing Norms

The Journal of Machine Learning Research

Structured sparsity and generalization

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be "relaxed" by regularizing the squared error with a convex penalty function like the 驴1 norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the 驴1 norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.