A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

Authors:
Guo-Xun Yuan;Kai-Wei Chang;Cho-Jui Hsieh;Chih-Jen Lin
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2010

Citing 42
Cited 15

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
A limited memory algorithm for bound constrained optimization

SIAM Journal on Scientific Computing
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Making large-scale support vector machine learning practical

Advances in kernel methods
Newton's Method for Large Bound-Constrained Optimization Problems

SIAM Journal on Optimization
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Feature Selection via Concave Minimization and Support Vector Machines

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Adaptive Sparseness for Supervised Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Grafting: fast, incremental feature selection by gradient descent in function space

The Journal of Machine Learning Research
A Feature Selection Newton Method for Support Vector Machine Classification

Computational Optimization and Applications
Gradient LASSO for feature selection

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Bayesian Approach to Joint Feature Selection and Classifier Design

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization

The Journal of Machine Learning Research
On Model Selection Consistency of Lasso

The Journal of Machine Learning Research
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
Stagewise Lasso

The Journal of Machine Learning Research
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
A coordinate gradient descent method for nonsmooth separable minimization

Mathematical Programming: Series A and B
Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

ECML '07 Proceedings of the 18th European conference on Machine Learning
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

The Journal of Machine Learning Research
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Boosting with structural sparsity

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient Euclidean projections in linear time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic methods for l1 regularized loss minimization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale sparse logistic regression

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
EfficientL1regularized logistic regression

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Sparse reconstruction by separable approximation

IEEE Transactions on Signal Processing
Fixed-Point Continuation for $\ell_1$-Minimization: Methodology and Convergence

SIAM Journal on Optimization
Bundle Methods for Regularized Risk Minimization

The Journal of Machine Learning Research
A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models

The Journal of Machine Learning Research
A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

The Journal of Machine Learning Research
A coordinate gradient descent method for l1-regularized convex minimization

Computational Optimization and Applications
Fast Solution of -Norm Minimization Problems When the Solution May Be Sparse

IEEE Transactions on Information Theory
The generalized LASSO

IEEE Transactions on Neural Networks
A Fast Tracking Algorithm for Generalized LARS/LASSO

IEEE Transactions on Neural Networks

Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

The Journal of Machine Learning Research
An improved GLMNET for l1-regularized logistic regression

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Proximal Methods for Hierarchical Sparse Coding

The Journal of Machine Learning Research
Structured Variable Selection with Sparsity-Inducing Norms

The Journal of Machine Learning Research
A novel feature selection method based on normalized mutual information

Applied Intelligence
An improved GLMNET for L1-regularized logistic regression

The Journal of Machine Learning Research
Learning class-to-image distance via large margin and l1-norm regularization

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Stochastic coordinate descent methods for regularized smooth and nonsmooth losses

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Feature reduction for efficient object detection via l1-norm latent SVM

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Learning non-linear classifiers with a sparsity constraint using L1 regularization

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Fast training of effective multi-class boosting using coordinate descent optimization

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Multi-target regression with rule ensembles

The Journal of Machine Learning Research
Large-scale linear support vector regression

The Journal of Machine Learning Research
Analyzing sentiments in Web 2.0 social media data in Chinese: experiments on business and marketing related Chinese Web forums

Information Technology and Management
Block coordinate descent algorithms for large-scale sparse multiclass classification

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale linear classification is widely used in many areas. The L1-regularized form can be applied for feature selection; however, its non-differentiability causes more difficulties in training. Although various optimization methods have been proposed in recent years, these have not yet been compared suitably. In this paper, we first broadly review existing methods. Then, we discuss state-of-the-art software packages in detail and propose two efficient implementations. Extensive comparisons indicate that carefully implemented coordinate descent methods are very suitable for training large document data.