Acceleration of stochastic approximation by averaging
SIAM Journal on Control and Optimization
Identifiable surfaces in constrained optimization
SIAM Journal on Control and Optimization
Degenerate Nonlinear Programming with a Quadratic Growth Condition
SIAM Journal on Optimization
Active Sets, Nonsmoothness, and Sensitivity
SIAM Journal on Optimization
Solving large scale linear prediction problems using stochastic gradient descent algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Active Set Identification in Nonlinear Programming
SIAM Journal on Optimization
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
The Journal of Machine Learning Research
Primal-dual subgradient methods for convex problems
Mathematical Programming: Series A and B - Series B - Special Issue: Nonsmooth Optimization and Applications
Sparse Online Learning via Truncated Gradient
The Journal of Machine Learning Research
Sparse reconstruction by separable approximation
IEEE Transactions on Signal Processing
Robust Stochastic Approximation Approach to Stochastic Programming
SIAM Journal on Optimization
Efficient Online and Batch Learning Using Forward Backward Splitting
The Journal of Machine Learning Research
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization
The Journal of Machine Learning Research
Logarithmic regret algorithms for online convex optimization
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Optimal distributed online prediction using mini-batches
The Journal of Machine Learning Research
Accelerated Block-coordinate Relaxation for Regularized Optimization
SIAM Journal on Optimization
Hi-index | 0.00 |
Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a low-dimensional manifold of parameter space along which the regularizer is smooth. (When an l1 regularizer is used to induce sparsity in the solution, for example, this manifold is defined by the set of nonzero components of the parameter vector.) This paper shows that a regularized dual averaging algorithm can identify this manifold, with high probability, before reaching the solution. This observation motivates an algorithmic strategy in which, once an iterate is suspected of lying on an optimal or near-optimal manifold, we switch to a "local phase" that searches in this manifold, thus converging rapidly to a near-optimal point. Computational results are presented to verify the identification property and to illustrate the effectiveness of this approach.