Sufficient conditions for error backflow convergence in dynamical recurrent neural networks

Authors:
Alex Aussem
Affiliations:
LIMOS (FRE CNRS 2239), University Blaise Pascal, Clermont Ferrand II, 63173 Aubiere Cedex, France
Venue:
Neural Computation
Year:
2002

Citing 11
Cited 2

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Local feedback multilayered networks

Neural Computation
A fixed size storage O(n3) time complexity learning algorithm for fully recurrent continually running networks

Neural Computation
Finite impulse response neural networks with applications in time series prediction

Finite impulse response neural networks with applications in time series prediction
Efficient training of recurrent neural network with time delays

Neural Networks
Long short-term memory

Neural Computation
Learning to Forget: Continual Prediction with LSTM

Neural Computation
Learning long-term dependencies in NARX recurrent neural networks

IEEE Transactions on Neural Networks
On-line learning algorithms for locally recurrent neural networks

IEEE Transactions on Neural Networks
Discrete-time backpropagation for training synaptic delay-based artificial neural networks

IEEE Transactions on Neural Networks
Gradient descent learning algorithm overview: a general dynamical systems perspective

IEEE Transactions on Neural Networks

A new boosting algorithm for improved time-series forecasting with recurrent neural networks

Information Fusion
Closed loop stability of FIR-recurrent neural networks

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article extends previous analysis of the gradient decay to a class of discrete-time fully recurrent networks, called dynamical recurrent neural networks, obtained by modeling synapses as finite impulse response (FIR) filters instead of multiplicative scalars. Using elementary matrix manipulations, we provide an upper bound on the norm of the weight matrix, ensuring that the gradient vector, when propagated in a reverse manner in time through the error-propagation network, decays exponentially to zero. This bound applies to all recurrent FIR architecture proposals, as well as fixed-point recurrent networks, regardless of delay and connectivity. In addition, we show that the computational overhead of the learning algorithm can be reduced drastically by taking advantage of the exponential decay of the gradient.