Self-scaled conjugate gradient training algorithms

Authors:
A. E. Kostopoulos;T. N. Grapsa
Affiliations:
University of Patras, Department of Mathematics, GR-265 04 Rio, Patras, Greece;University of Patras, Department of Mathematics, GR-265 04 Rio, Patras, Greece
Venue:
Neurocomputing
Year:
2009

Citing 17
Cited 1

Introduction to non-linear optimization

Introduction to non-linear optimization
A nonmonotone line search technique for Newton's method

SIAM Journal on Numerical Analysis
Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
First- and second-order methods for learning: between steepest descent and Newton's method

Neural Computation
Original Contribution: A scaled conjugate gradient algorithm for fast supervised learning

Neural Networks
Effective backpropagation training with variable stepsize

Neural Networks
Improving the convergence of the backpropagation algorithm using learning rate adaptation methods

Neural Computation
Algorithm 500: Minimization of Unconstrained Multivariate Functions [E4]

ACM Transactions on Mathematical Software (TOMS)
Globally Convergent Modification of the Quickprop Method

Neural Processing Letters
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Sign-based learning schemes for pattern classification

Pattern Recognition Letters - Special issue: Artificial neural networks in pattern recognition
New globally convergent training scheme based on the resilient propagation algorithm

Neurocomputing
Deterministic nonmonotone strategies for effective training of multilayer perceptrons

IEEE Transactions on Neural Networks
A new class of quasi-Newtonian methods for optimal learning in MLP-networks

IEEE Transactions on Neural Networks

A hybrid method for MRI brain image classification

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

This article presents some efficient training algorithms, based on conjugate gradient optimization methods. In addition to the existing conjugate gradient training algorithms, we introduce Perry's conjugate gradient method as a training algorithm [A. Perry, A modified conjugate gradient algorithm, Operations Research 26 (1978) 26-43]. Perry's method has been proven to be a very efficient method in the context of unconstrained optimization, but it has never been used in MLP training. Furthermore, a new class of conjugate gradient (CG) methods is proposed, called self-scaled CG methods, which are derived from the principles of Hestenes-Stiefel, Fletcher-Reeves, Polak-Ribiere and Perry's method. This class is based on the spectral scaling parameter introduced in [J. Barzilai, J.M. Borwein, Two point step size gradient methods, IMA Journal of Numerical Analysis 8 (1988) 141-148]. The spectral scaling parameter contains second order information without estimating the Hessian matrix. Furthermore, we incorporate to the CG training algorithms an efficient line search technique based on the Wolfe conditions and on safeguarded cubic interpolation [D.F. Shanno, K.H. Phua, Minimization of unconstrained multivariate functions, ACM Transactions on Mathematical Software 2 (1976) 87-94]. In addition, the initial learning rate parameter, fed to the line search technique, was automatically adapted at each iteration by a closed formula proposed in [D.F. Shanno, K.H. Phua, Minimization of unconstrained multivariate functions, ACM Transactions on Mathematical Software 2 (1976) 87-94; D.G. Sotiropoulos, A.E. Kostopoulos, T.N. Grapsa, A spectral version of Perry's conjugate gradient method for neural network training, in: D.T. Tsahalis (Ed.), Fourth GRACM Congress on Computational Mechanics, vol. 1, 2002, pp. 172-179]. Finally, an efficient restarting procedure was employed in order to further improve the effectiveness of the CG training algorithms. Experimental results show that, in general, the new class of methods can perform better with a much lower computational cost and better success performance.