Block coordinate descent algorithms for large-scale sparse multiclass classification

Authors:
Mathieu Blondel;Kazuhiro Seki;Kuniaki Uehara
Affiliations:
Graduate School of System Informatics, Kobe University, Kobe, Japan 657-8501;Graduate School of System Informatics, Kobe University, Kobe, Japan 657-8501;Graduate School of System Informatics, Kobe University, Kobe, Japan 657-8501
Venue:
Machine Learning
Year:
2013

Citing 19
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
On Model Selection Consistency of Lasso

The Journal of Machine Learning Research
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
A coordinate gradient descent method for nonsmooth separable minimization

Mathematical Programming: Series A and B
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

The Journal of Machine Learning Research
Boosting with structural sparsity

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Feature hashing for large scale multitask learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Sparse reconstruction by separable approximation

IEEE Transactions on Signal Processing
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
Joint covariate selection and joint subspace selection for multiple classification problems

Statistics and Computing
Efficient Online and Batch Learning Using Forward Backward Splitting

The Journal of Machine Learning Research
A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

The Journal of Machine Learning Research
Pegasos: primal estimated sub-gradient solver for SVM

Mathematical Programming: Series A and B - Special Issue on "Optimization and Machine learning"; Alexandre d’Aspremont • Francis Bach • Inderjit S. Dhillon • Bin Yu
An improved GLMNET for l1-regularized logistic regression

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimization with Sparsity-Inducing Penalties

Foundations and Trends® in Machine Learning
Accelerated Block-coordinate Relaxation for Regularized Optimization

SIAM Journal on Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the past decade, ℓ 1 regularization has emerged as a powerful way to learn classifiers with implicit feature selection. More recently, mixed-norm (e.g., ℓ 1/ℓ 2) regularization has been utilized as a way to select entire groups of features. In this paper, we propose a novel direct multiclass formulation specifically designed for large-scale and high-dimensional problems such as document classification. Based on a multiclass extension of the squared hinge loss, our formulation employs ℓ 1/ℓ 2 regularization so as to force weights corresponding to the same features to be zero across all classes, resulting in compact and fast-to-evaluate multiclass models. For optimization, we employ two globally-convergent variants of block coordinate descent, one with line search (Tseng and Yun in Math. Program. 117:387---423, 2009) and the other without (Richtárik and Takáă驴 in Math. Program. 1---38, 2012a; Tech. Rep. arXiv:1212.0873 , 2012b). We present the two variants in a unified manner and develop the core components needed to efficiently solve our formulation. The end result is a couple of block coordinate descent algorithms specifically tailored to our multiclass formulation. Experimentally, we show that block coordinate descent performs favorably compared to other solvers such as FOBOS, FISTA and SpaRSA. Furthermore, we show that our formulation obtains very compact multiclass models and outperforms ℓ 1/ℓ 2-regularized multiclass logistic regression in terms of training speed, while achieving comparable test accuracy.