Acceleration of stochastic approximation by averaging
SIAM Journal on Control and Optimization
On the convergence of the exponential multiplier method for convex programming
Mathematical Programming: Series A and B
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Atomic Decomposition by Basis Pursuit
SIAM Journal on Scientific Computing
A Modified Forward-Backward Splitting Method for Maximal Monotone Mappings
SIAM Journal on Control and Optimization
An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule
SIAM Journal on Optimization
Convergence Rates in Forward--Backward Splitting
SIAM Journal on Optimization
Incremental Subgradient Methods for Nondifferentiable Optimization
SIAM Journal on Optimization
Interior-Point Methods for Massive Support Vector Machines
SIAM Journal on Optimization
The Robustness of the p-Norm Algorithms
Machine Learning
Convex Optimization
Solving large scale linear prediction problems using stochastic gradient descent algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Smooth minimization of non-smooth functions
Mathematical Programming: Series A and B
Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging
Problems of Information Transmission
Interior Gradient and Proximal Methods for Convex and Conic Optimization
SIAM Journal on Optimization
Prediction, Learning, and Games
Prediction, Learning, and Games
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
The Journal of Machine Learning Research
Iterated Hard Shrinkage for Minimization Problems with Sparsity Constraints
SIAM Journal on Scientific Computing
Confidence level solutions for stochastic programming
Automatica (Journal of IFAC)
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Algorithms for Sparse Linear Classifiers in the Massive Data Setting
The Journal of Machine Learning Research
Primal-dual subgradient methods for convex problems
Mathematical Programming: Series A and B - Series B - Special Issue: Nonsmooth Optimization and Applications
Stochastic methods for l1 regularized loss minimization
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Sparse Online Learning via Truncated Gradient
The Journal of Machine Learning Research
Sparse reconstruction by separable approximation
IEEE Transactions on Signal Processing
Robust Stochastic Approximation Approach to Stochastic Programming
SIAM Journal on Optimization
Incremental Stochastic Subgradient Algorithms for Convex Optimization
SIAM Journal on Optimization
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
Efficient Online and Batch Learning Using Forward Backward Splitting
The Journal of Machine Learning Research
A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems
SIAM Journal on Optimization
Mathematical Programming: Series A and B
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
The Journal of Machine Learning Research
Logarithmic regret algorithms for online convex optimization
COLT'06 Proceedings of the 19th annual conference on Learning Theory
An optimal method for stochastic composite optimization
Mathematical Programming: Series A and B
Stochastic Methods for l1-regularized Loss Minimization
The Journal of Machine Learning Research
Proximal Methods for Hierarchical Sparse Coding
The Journal of Machine Learning Research
Optimization with Sparsity-Inducing Penalties
Foundations and Trends® in Machine Learning
Optimal distributed online prediction using mini-batches
The Journal of Machine Learning Research
Manifold identification in dual averaging for regularized stochastic online learning
The Journal of Machine Learning Research
Descriptor learning using convex optimisation
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Stochastic coordinate descent methods for regularized smooth and nonsmooth losses
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Sublinear algorithms for penalized logistic regression in massive datasets
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Improving confidence of dual averaging stochastic online learning via aggregation
KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
Constrained stochastic gradient descent for large-scale least squares problem
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient online learning for multitask feature selection
ACM Transactions on Knowledge Discovery from Data (TKDD)
Sparsity regret bounds for individual sequences in online linear regression
The Journal of Machine Learning Research
Community question topic categorization via hierarchical kernelized classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as l1-norm for promoting sparsity. We develop extensions of Nesterov's dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of l1-regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.