A class of smoothing functions for nonlinear and mixed complementarity problems
Computational Optimization and Applications
Atomic Decomposition by Basis Pursuit
SIAM Journal on Scientific Computing
SSVM: A Smooth Support Vector Machine for Classification
Computational Optimization and Applications
Adaptive Sparseness for Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to variable and feature selection
The Journal of Machine Learning Research
Grafting: fast, incremental feature selection by gradient descent in function space
The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods
The Journal of Machine Learning Research
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
On Model Selection Consistency of Lasso
The Journal of Machine Learning Research
EfficientL1regularized logistic regression
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A Unified View of Matrix Factorization Models
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Two Step Variational Method for Subpixel Optical Flow Computation
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
An integrated machine learning approach to stroke prediction
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Medical coding classification by leveraging inter-code relationships
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Grafting-light: fast, incremental feature selection and structure learning of Markov random fields
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed sparse linear regression
IEEE Transactions on Signal Processing
Variational method for super-resolution optical flow
Signal Processing
The Journal of Machine Learning Research
An improved GLMNET for l1-regularized logistic regression
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Connectivity-informed fMRI activation detection
MICCAI'11 Proceedings of the 14th international conference on Medical image computing and computer-assisted intervention - Volume Part II
Optimization with Sparsity-Inducing Penalties
Foundations and Trends® in Machine Learning
A fast dual projected Newton method for l1-regularized least squares
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Overlapping decomposition for causal graphical modeling
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from bullying traces in social media
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Optimal design of neuro-mechanical oscillators
Computers and Structures
Efficient simulation of secondary motion in rig-space
Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation
A comparison of typical ℓp minimization algorithms
Neurocomputing
Journal of Computational Neuroscience
Nonparametric sparsity and regularization
The Journal of Machine Learning Research
Gaussian Kullback-Leibler approximate inference
The Journal of Machine Learning Research
Hi-index | 0.00 |
L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required.