Elastic-net regularization in learning theory

  • Authors:
  • Christine De Mol;Ernesto De Vito;Lorenzo Rosasco

  • Affiliations:
  • Department of Mathematics and ECARES, Université Libre de Bruxelles, Campus Plaine CP 217, Bd du Triomphe, 1050 Brussels, Belgium;Dipartimento di Scienze per l'Architettura, Universití di Genova, Stradone Sant'Agostino, 37, 16123, Genova, Italy and INFN, Sezione di Genova, Via Dodecaneso 33, 16146 Genova, Italy;Center for Biological and Computational Learning, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA 02139, United States and Dipartimento di Informatica e Scienze dell'Informa ...

  • Venue:
  • Journal of Complexity
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Within the framework of statistical learning theory we analyze in detail the so-called elastic-net regularization scheme proposed by Zou and Hastie [H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B, 67(2) (2005) 301-320] for the selection of groups of correlated variables. To investigate the statistical properties of this scheme and in particular its consistency properties, we set up a suitable mathematical framework. Our setting is random-design regression where we allow the response variable to be vector-valued and we consider prediction functions which are linear combinations of elements (features) in an infinite-dimensional dictionary. Under the assumption that the regression function admits a sparse representation on the dictionary, we prove that there exists a particular ''elastic-net representation'' of the regression function such that, if the number of data increases, the elastic-net estimator is consistent not only for prediction but also for variable/feature selection. Our results include finite-sample bounds and an adaptive scheme to select the regularization parameter. Moreover, using convex analysis tools, we derive an iterative thresholding algorithm for computing the elastic-net solution which is different from the optimization procedure originally proposed in the above-cited work.