Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

Authors:
Mark Schmidt;Glenn Fung;Rómer Rosales
Affiliations:
Department of Computer Science University of British Columbia,;IKM CKS, Siemens Medical Solutions, USA;IKM CKS, Siemens Medical Solutions, USA
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 11
Cited 21

A class of smoothing functions for nonlinear and mixed complementarity problems

Computational Optimization and Applications
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
SSVM: A Smooth Support Vector Machine for Classification

Computational Optimization and Applications
Adaptive Sparseness for Supervised Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to variable and feature selection

The Journal of Machine Learning Research
Grafting: fast, incremental feature selection by gradient descent in function space

The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
On Model Selection Consistency of Lasso

The Journal of Machine Learning Research
EfficientL1regularized logistic regression

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

A Unified View of Matrix Factorization Models

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Two Step Variational Method for Subpixel Optical Flow Computation

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
An integrated machine learning approach to stroke prediction

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Medical coding classification by leveraging inter-code relationships

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Grafting-light: fast, incremental feature selection and structure learning of Markov random fields

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed sparse linear regression

IEEE Transactions on Signal Processing
Variational method for super-resolution optical flow

Signal Processing
A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

The Journal of Machine Learning Research
An improved GLMNET for l1-regularized logistic regression

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Connectivity-informed fMRI activation detection

MICCAI'11 Proceedings of the 14th international conference on Medical image computing and computer-assisted intervention - Volume Part II
Optimization with Sparsity-Inducing Penalties

Foundations and Trends® in Machine Learning
A fast dual projected Newton method for l1-regularized least squares

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Overlapping decomposition for causal graphical modeling

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from bullying traces in social media

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Optimal design of neuro-mechanical oscillators

Computers and Structures
Efficient simulation of secondary motion in rig-space

Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation
A comparison of typical ℓp minimization algorithms

Neurocomputing
Identification of sparse neural functional connectivity using penalized likelihood estimation and basis functions

Journal of Computational Neuroscience
Nonparametric sparsity and regularization

The Journal of Machine Learning Research
Gaussian Kullback-Leibler approximate inference

The Journal of Machine Learning Research
1-Norm extreme learning machine for regression and multiclass classification using Newton method

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required.