Multilayer feedforward networks are universal approximators
Neural Networks
Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
The roots of backpropagation: from ordered derivatives to neural networks and political forecasting
The roots of backpropagation: from ordered derivatives to neural networks and political forecasting
The vanishing gradient problem during learning recurrent neural nets and problem solutions
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Recurrent Neural Networks: Design and Applications
Recurrent Neural Networks: Design and Applications
Field Guide to Dynamical Recurrent Networks
Field Guide to Dynamical Recurrent Networks
Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Neural Networks in a Softcomputing Framework
Neural Networks in a Softcomputing Framework
New Directions in Statistical Signal Processing: From Systems to Brains (Neural Information Processing)
Neural Computation
Learning long-term dependencies with gradient descent is difficult
IEEE Transactions on Neural Networks
Gradient calculations for dynamic recurrent neural networks: a survey
IEEE Transactions on Neural Networks
Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario
ACM Transactions on Speech and Language Processing (TSLP)
Computer Speech and Language
Hi-index | 0.01 |
Recurrent neural networks (RNN) unfolded in time are in theory able to map any open dynamical system. Still, they are often blamed to be unable to identify long-term dependencies in the data. Especially when they are trained with backpropagation it is claimed that RNN unfolded in time fail to learn inter-temporal influences more than 10 time steps apart. This paper refutes this often cited statement by giving counter-examples. We show that basic time-delay RNN unfolded in time and formulated as state space models are indeed capable of learning time lags of at least a 100 time steps. We point out that they even possess a self-regularisation characteristic, which adapts the internal error backflow, and analyse their optimal weight initialisation. In addition, we introduce the idea of inflation for modelling of long- and short-term memory and demonstrate that this technique further improves the performance of RNN.