Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Authors:
Thore Graepel;Nicol N. Schraudolph
Affiliations:
-;-
Venue:
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Year:
2002

Citing 5
Cited 1

Training multilayer perceptrons with the extended Kalman algorithm

Advances in neural information processing systems 1
Fast exact multiplication by the Hessian

Neural Computation
Dynamics and algorithms for stochastic search

Dynamics and algorithms for stochastic search
Fast curvature matrix-vector products for second-order gradient descent

Neural Computation
Automatic Learning Rate Maximization in Large Adaptive Machines

Advances in Neural Information Processing Systems 5, [NIPS Conference]

Conjugate Directions for Stochastic Gradient Descent

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of developing rapid, stable, and scalable stochastic gradient descent algorithms for optimisation of very large nonlinear systems. Based on earlier work by Orr et al. on adaptive momentum--an efficient yet extremely unstable stochastic gradient descent algorithm--we develop a stabilised adaptive momentum algorithm that is suitable for noisy nonlinear optimisation problems. The stability is improved by introducing a forgetting factor 0 驴驴驴 1 that smoothes the trajectory and enables adaptation in non-stationary environments. The scalability of the new algorithm follows from the fact that at each iteration the multiplication by the curvature matrix can be achieved in O (n) steps using automatic differentiation tools. We illustrate the behaviour of the new algorithm on two examples: a linear neuron with squared loss and highly correlated inputs, and a multilayer perceptron applied to the four regions benchmark task.