Learning long-term dependencies with recurrent neural networks

Authors:
Anton Maximilian Schaefer;Steffen Udluft;Hans-Georg Zimmermann
Affiliations:
Information and Communications, Learning Systems, Siemens AG, Corporate Technology, 80200 Munich, Germany and Neuroinformatics Group, University of Osnabrueck, 49069 Osnabrueck, Germany;Information and Communications, Learning Systems, Siemens AG, Corporate Technology, 80200 Munich, Germany;Information and Communications, Learning Systems, Siemens AG, Corporate Technology, 80200 Munich, Germany
Venue:
Neurocomputing
Year:
2008

Citing 14
Cited 2

Multilayer feedforward networks are universal approximators

Neural Networks
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
The roots of backpropagation: from ordered derivatives to neural networks and political forecasting

The roots of backpropagation: from ordered derivatives to neural networks and political forecasting
Gradient-based learning algorithms for recurrent networks and their computational complexity

Backpropagation
The vanishing gradient problem during learning recurrent neural nets and problem solutions

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Recurrent Neural Networks: Design and Applications

Recurrent Neural Networks: Design and Applications
Field Guide to Dynamical Recurrent Networks

Field Guide to Dynamical Recurrent Networks
How to Train Neural Networks

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Neural Networks in a Softcomputing Framework

Neural Networks in a Softcomputing Framework
New Directions in Statistical Signal Processing: From Systems to Brains (Neural Information Processing)

New Directions in Statistical Signal Processing: From Systems to Brains (Neural Information Processing)
Long Short-Term Memory

Neural Computation
Learning long-term dependencies with gradient descent is difficult

IEEE Transactions on Neural Networks
Gradient calculations for dynamic recurrent neural networks: a survey

IEEE Transactions on Neural Networks

Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

ACM Transactions on Speech and Language Processing (TSLP)
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recurrent neural networks (RNN) unfolded in time are in theory able to map any open dynamical system. Still, they are often blamed to be unable to identify long-term dependencies in the data. Especially when they are trained with backpropagation it is claimed that RNN unfolded in time fail to learn inter-temporal influences more than 10 time steps apart. This paper refutes this often cited statement by giving counter-examples. We show that basic time-delay RNN unfolded in time and formulated as state space models are indeed capable of learning time lags of at least a 100 time steps. We point out that they even possess a self-regularisation characteristic, which adapts the internal error backflow, and analyse their optimal weight initialisation. In addition, we introduce the idea of inflation for modelling of long- and short-term memory and demonstrate that this technique further improves the performance of RNN.