On structure-exploiting trust-region regularized nonlinear least squares algorithms for neural-network learning

Authors:
Eiji Mizutani;James W. Demmel
Affiliations:
Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan, ROC;Department of Mathematics and Computer Science Division, University of California at Berkeley, CA
Venue:
Neural Networks - 2003 Special issue: Advances in neural networks research — IJCNN'03
Year:
2003

Citing 9
Cited 2

On large scale nonlinear least squares calculations

SIAM Journal on Scientific and Statistical Computing
Separable nonlinear least squares with multiple right-hand sides

SIAM Journal on Matrix Analysis and Applications
Fast exact multiplication by the Hessian

Neural Computation
Advanced algorithms for neural networks: a C++ sourcebook

Advanced algorithms for neural networks: a C++ sourcebook
Applied numerical linear algebra

Applied numerical linear algebra
Natural gradient works efficiently in learning

Neural Computation
An Adaptive Nonlinear Least-Squares Algorithm

ACM Transactions on Mathematical Software (TOMS)
Trust-region methods

Trust-region methods
On Derivation of MLP Backpropagation from the Kelley-Bryson Optimal-Control Gradient Formula and Its Application

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 2 - Volume 2

2008 Special Issue: Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature

Neural Networks
A comparative study of parametric coding and wavelet coding based feature extraction techniques in recognizing spoken words

Proceedings of the CUBE International Information Technology Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper briefly introduces our numerical linear algebra approaches for solving structured nonlinear least squares problems arising from 'multiple-output' neural-network (NN) models. Our algorithms feature trust-region regularization, and exploit sparsity of either the 'blockangular' residual Jacobian matrix or the 'block-arrow' Gauss-Newton Hessian (or Fisher information matrix in statistical sense) depending on problem scale so as to render a large class of NN-learning algorithms 'efficient' in both memory and operation costs. Using a relatively large real-world nonlinear regression application, we shall explain algorithmic strengths and weaknesses, analyzing simulation results obtained by both direct and iterative trust-region algorithms with two distinct NN models: 'multilayer perceptrons' (MLP) and 'complementary mixtures of MLP-experts' (or neuro-fuzzy modular networks).