Natural conjugate gradient training of multilayer perceptrons

Authors:
Ana González;José R. Dorronsoro
Affiliations:
Dpto. de Ingeniería Informática and Instituto de Ingeniería del Conocimiento, Universidad Autónoma de Madrid, Escuela Politecnica Superior, 28049 Madrid, Spain;Dpto. de Ingeniería Informática and Instituto de Ingeniería del Conocimiento, Universidad Autónoma de Madrid, Escuela Politecnica Superior, 28049 Madrid, Spain
Venue:
Neurocomputing
Year:
2008

Citing 7
Cited 5

Natural gradient works efficiently in learning

Neural Computation
Complexity issues in natural gradient descent method for training multilayer perceptrons

Neural Computation
Effiicient BackProp

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons

Neural Computation
On "Natural" Learning and Pruning in Multilayered Perceptrons

Neural Computation
Editorial: Geometrical methods in neural networks and learning

Neurocomputing
Natural conjugate gradient training of multilayer perceptrons

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I

Bidirectional relation between CMA evolution strategies and natural evolution strategies

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes

The Journal of Machine Learning Research
A hybrid method for MRI brain image classification

Expert Systems with Applications: An International Journal
Deterministic convergence of conjugate gradient method for feedforward neural networks

Neurocomputing
Performance Improved Iteration-Free Artificial Neural Networks for Abnormal Magnetic Resonance Brain Image Classification

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Natural gradient (NG) descent, arguably the fastest on-line method for multilayer perceptron (MLP) training, exploits the ''natural'' Riemannian metric that the Fisher information matrix defines in the MLP weight space. It also accelerates ordinary gradient descent in a batch setting but then the Fisher matrix essentially coincides with the Gauss-Newton approximation of the Hessian of the MLP square error function and NG is thus related to the Levenberg-Marquardt (LM) method, which may explain its speed-up with respect to standard gradient descent. However, even this comparison is advantageous for NG descent as it should have a linear convergence in a Riemannian weight space compared to the superlinear one of the LM method in the Euclidean weight space. This suggests that it may be interesting to consider superlinear methods for MLP training in a Riemannian setting. In this work we shall discuss how to introduce a natural conjugate gradient (CG) method for MLP training. While a fully Riemannian formulation would result in an extremely costly procedure, we shall make some simplifying assumptions that should give descent directions with properties similar to those of standard CG descent. Moreover, we will also show numerically that natural CG may lead to a faster convergence to better minima, although with a greater cost than that of standard CG that, nevertheless, may be alleviated using a diagonal natural CG variant.