Conjugate Directions for Stochastic Gradient Descent

Authors:
Nicol N. Schraudolph;Thore Graepel
Affiliations:
-;-
Venue:
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Year:
2002

Citing 5
Cited 4

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Fast exact multiplication by the Hessian

Neural Computation
Dynamics and algorithms for stochastic search

Dynamics and algorithms for stochastic search
Fast curvature matrix-vector products for second-order gradient descent

Neural Computation
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks

Universal parameter optimisation in games based on SPSA

Machine Learning
The factored policy-gradient planner

Artificial Intelligence
Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning

Machine Learning
Towards adjusting mobile devices to user's behaviour

MSM'10/MUSE'10 Proceedings of the 2010 international conference on Analysis of social media and ubiquitous data

Quantified Score

Hi-index	0.00

Visualization

Abstract

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resultingonline learningalg orithms converge orders of magnitude faster than ordinary stochastic gradient descent.