Stochastic coordinate descent methods for regularized smooth and nonsmooth losses

Authors:
Qing Tao;Kang Kong;Dejun Chu;Gaowei Wu
Affiliations:
New Star Research Inst. of Applied Technology, Hefei, P.R. China, Inst. of Automation, Chinese Academy of Sciences, Beijing, P.R. China;New Star Research Inst. of Applied Technology, Hefei, P.R. China;New Star Research Inst. of Applied Technology, Hefei, P.R. China;Inst. of Automation, Chinese Academy of Sciences, Beijing, P.R. China
Venue:
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Year:
2012

Citing 20
Cited 0

Convex Optimization

Convex Optimization
Smooth minimization of non-smooth functions

Mathematical Programming: Series A and B
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
A coordinate gradient descent method for nonsmooth separable minimization

Mathematical Programming: Series A and B
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

The Journal of Machine Learning Research
Primal-dual subgradient methods for convex problems

Mathematical Programming: Series A and B - Series B - Special Issue: Nonsmooth Optimization and Applications
Efficient Euclidean projections in linear time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic methods for l1 regularized loss minimization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

The Journal of Machine Learning Research
A coordinate gradient descent method for l1-regularized convex minimization

Computational Optimization and Applications
Successive overrelaxation for support vector machines

IEEE Transactions on Neural Networks
Developing Learning Algorithms via Optimized Discretization of Continuous Dynamical Systems

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stochastic Coordinate Descent (SCD) methods are among the first optimization schemes suggested for efficiently solving large scale problems. However, until now, there exists a gap between the convergence rate analysis and practical SCD algorithms for general smooth losses and there is no primal SCD algorithm for nonsmooth losses. In this paper, we discuss these issues using the recently developed structural optimization techniques. In particular, we first present a principled and practical SCD algorithm for regularized smooth losses, in which the one-variable subproblem is solved using the proximal gradient method and the adaptive componentwise Lipschitz constant is obtained employing the line search strategy. When the loss is nonsmooth, we present a novel SCD algorithm, in which the one-variable subproblem is solved using the dual averaging method. We show that our algorithms exploit the regularization structure and achieve several optimal convergence rates that are standard in the literature. The experiments demonstrate the expected efficiency of our SCD algorithms in both smooth and nonsmooth cases.