Fast curvature matrix-vector products for second-order gradient descent

Authors:
Nicol N. Schraudolph
Affiliations:
IDSIA, Galleria 2, 6928 Manno, Switzerland, and Institute of Computational Science, ETH Zentrum, 8092 Zürich, Switzerland
Venue:
Neural Computation
Year:
2002

Citing 12
Cited 15

Training multilayer perceptrons with the extended Kalman algorithm

Advances in neural information processing systems 1
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Fast exact multiplication by the Hessian

Neural Computation
Additive versus exponentiated gradient updates for linear prediction

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Dynamics and algorithms for stochastic search

Dynamics and algorithms for stochastic search
Natural gradient works efficiently in learning

Neural Computation
Complexity issues in natural gradient descent method for training multilayer perceptrons

Neural Computation
Incorporating curvature information into on-line learning

On-line learning in neural networks
A fast, compact approximation of the exponential function

Neural Computation
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Local Gain Adaptation in Stochastic Gradient Descent

Local Gain Adaptation in Stochastic Gradient Descent
Online Independent Component Analysis With Local Learning Rate Adaptation

Online Independent Component Analysis With Local Learning Rate Adaptation

On-line learning in changing environments with applications in supervised and unsupervised learning

Neural Networks - Computational models of neuromodulation
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Conjugate Directions for Stochastic Gradient Descent

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Step Size Adaptation in Reproducing Kernel Hilbert Space

The Journal of Machine Learning Research
A Very Fast Learning Method for Neural Networks Based on Sensitivity Analysis

The Journal of Machine Learning Research
Limited stochastic meta-descent for kernel-based online learning

Neural Computation
Segmented-memory recurrent neural networks

IEEE Transactions on Neural Networks
Linear least-squares based methods for neural networks learning

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
3D hand tracking in a stochastic approximation setting

Proceedings of the 2nd conference on Human motion: understanding, modeling, capture and animation
A convolutional learning system for object classification in 3-D lidar data

IEEE Transactions on Neural Networks
A stochastic conjugate gradient method for the approximation of functions

Journal of Computational and Applied Mathematics
Efficient calculation of the gauss-newton approximation of the hessian matrix in neural networks

Neural Computation
An Optimization Rule for In Silico Identification of Targeted Overproduction in Metabolic Pathways

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a generic method for iteratively approximating various second-order gradient steps--Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient--in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for on-line learning, matrix momentum and stochastic meta-descent (SMD), implement this approach. Since both were originally derived by very different routes, this offers fresh insight into their operation, resulting in further improvements to SMD.