A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
On Model Selection Consistency of Lasso
The Journal of Machine Learning Research
Confidence-weighted linear classification
Proceedings of the 25th international conference on Machine learning
A coordinate gradient descent method for nonsmooth separable minimization
Mathematical Programming: Series A and B
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines
The Journal of Machine Learning Research
Boosting with structural sparsity
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Feature hashing for large scale multitask learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Sparse reconstruction by separable approximation
IEEE Transactions on Signal Processing
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
Joint covariate selection and joint subspace selection for multiple classification problems
Statistics and Computing
Efficient Online and Batch Learning Using Forward Backward Splitting
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Pegasos: primal estimated sub-gradient solver for SVM
Mathematical Programming: Series A and B - Special Issue on "Optimization and Machine learning"; Alexandre d’Aspremont • Francis Bach • Inderjit S. Dhillon • Bin Yu
An improved GLMNET for l1-regularized logistic regression
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimization with Sparsity-Inducing Penalties
Foundations and Trends® in Machine Learning
Accelerated Block-coordinate Relaxation for Regularized Optimization
SIAM Journal on Optimization
Hi-index | 0.00 |
Over the past decade, ℓ 1 regularization has emerged as a powerful way to learn classifiers with implicit feature selection. More recently, mixed-norm (e.g., ℓ 1/ℓ 2) regularization has been utilized as a way to select entire groups of features. In this paper, we propose a novel direct multiclass formulation specifically designed for large-scale and high-dimensional problems such as document classification. Based on a multiclass extension of the squared hinge loss, our formulation employs ℓ 1/ℓ 2 regularization so as to force weights corresponding to the same features to be zero across all classes, resulting in compact and fast-to-evaluate multiclass models. For optimization, we employ two globally-convergent variants of block coordinate descent, one with line search (Tseng and Yun in Math. Program. 117:387---423, 2009) and the other without (Richtárik and Takáă驴 in Math. Program. 1---38, 2012a; Tech. Rep. arXiv:1212.0873 , 2012b). We present the two variants in a unified manner and develop the core components needed to efficiently solve our formulation. The end result is a couple of block coordinate descent algorithms specifically tailored to our multiclass formulation. Experimentally, we show that block coordinate descent performs favorably compared to other solvers such as FOBOS, FISTA and SpaRSA. Furthermore, we show that our formulation obtains very compact multiclass models and outperforms ℓ 1/ℓ 2-regularized multiclass logistic regression in terms of training speed, while achieving comparable test accuracy.