Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Authors:
John Duchi;Elad Hazan;Yoram Singer
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2011

Citing 20
Cited 8

Matrix analysis

Matrix analysis
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds

Mathematical Programming: Series A and B
Adaptive and Self-Confident On-Line Learning Algorithms

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Convex Optimization

Convex Optimization
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
A Second-Order Perceptron Algorithm

SIAM Journal on Computing
Smooth minimization of non-smooth functions

Mathematical Programming: Series A and B
Efficient algorithms for online decision problems

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Improved second-order bounds for prediction with expert advice

Machine Learning
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Primal-dual subgradient methods for convex problems

Mathematical Programming: Series A and B - Series B - Special Issue: Nonsmooth Optimization and Applications
Robust Stochastic Approximation Approach to Stochastic Programming

SIAM Journal on Optimization
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

The Journal of Machine Learning Research
Efficient Online and Batch Learning Using Forward Backward Splitting

The Journal of Machine Learning Research
Logarithmic regret algorithms for online convex optimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
An optimal method for stochastic composite optimization

Mathematical Programming: Series A and B
On the generalization ability of on-line learning algorithms

IEEE Transactions on Information Theory
Mirror descent and nonlinear projected subgradient methods for convex optimization

Operations Research Letters

Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
Optimization with Sparsity-Inducing Penalties

Foundations and Trends® in Machine Learning
Learning from evolving data streams: online triage of bug reports

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Online learning with multiple kernels: A review

Neural Computation
Adaptive regularization of weight vectors

Machine Learning
Constrained stochastic gradient descent for large-scale least squares problem

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Taxonomy discovery for personalized recommendation

Proceedings of the 7th ACM international conference on Web search and data mining
BLDC motor speed control system fault diagnosis based on LRGF neural network and adaptive lifting scheme

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.