Learning to Forget: Continual Prediction with LSTM

Authors:
Felix A. Gers;Jürgen A. Schmidhuber;Fred A. Cummins
Affiliations:
IDSIA;IDSIA;IDSIA
Venue:
Neural Computation
Year:
2000

Citing 13
Cited 30

Adaptive neural oscillator using continuous-time back-propagation learning

Neural Networks
Modular construction of time-delay neural networks for speech recognition

Neural Computation
An efficient gradient-based algorithm for on-line training of recurrent network trajectories

Neural Computation
The recurrent cascade-correlation architecture

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Learning sequential structure with the real-time recurrent learning algorithm

International Journal of Neural Systems
A fixed size storage O(n3) time complexity learning algorithm for fully recurrent continually running networks

Neural Computation
Gradient-based learning algorithms for recurrent networks and their computational complexity

Backpropagation
Stochastic approximation and neural network learning

The handbook of brain theory and neural networks
A fast, compact approximation of the exponential function

Neural Computation
Learning to Forget: Continual Prediction with LSTM

Learning to Forget: Continual Prediction with LSTM
Long Short-Term Memory

Neural Computation
Learning long-term dependencies in NARX recurrent neural networks

IEEE Transactions on Neural Networks
Gradient calculations for dynamic recurrent neural networks: a survey

IEEE Transactions on Neural Networks

Sufficient conditions for error backflow convergence in dynamical recurrent neural networks

Neural Computation
Unsupervised Learning in LSTM Recurrent Neural Networks

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Improving Long-Term Online Prediction with Decoupled Extended Kalman Filters

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Learning Context Sensitive Languages with LSTM Trained with Kalman Filters

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Applying LSTM to Time Series Predictable through Time-Window Approaches

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets

Neural Networks
Local maximum ozone concentration prediction using soft computing methodologies

Systems Analysis Modelling Simulation
Learning precise timing with lstm recurrent networks

The Journal of Machine Learning Research
Modeling systems with internal state using evolino

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia

Neural Computation
Imitation learning with spiking neural networks and real-world devices

Engineering Applications of Artificial Intelligence
Recurrent Neural Networks for Music Computation

INFORMS Journal on Computing
Training Recurrent Networks by Evolino

Neural Computation
An investigation of the state formation and transition limitations for prediction problems in recurrent neural networks

ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
REINFORCEMENT LEARNING FOR POMDP USING STATE CLASSIFICATION

Applied Artificial Intelligence
Knowledge-based recurrent neural networks in Reinforcement Learning

ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
Evolving Memory Cell Structures for Sequence Learning

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Fault detection in discrete event based distributed systems by forecasting message sequences with neural networks

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Systematically grounding language through vision in a deep, recurrent neural network

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Enhancing spontaneous speech recognition with BLSTM features

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
A generalized LSTM-like training algorithm for second-order recurrent neural networks

Neural Networks
Error entropy minimization for LSTM training

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Survey: Reservoir computing approaches to recurrent neural network training

Computer Science Review
Keyword spotting exploiting Long Short-Term Memory

Speech Communication
Observer effect from stateful resources in agent sensing

Autonomous Agents and Multi-Agent Systems
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language
LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

Image and Vision Computing
Selective Recurrent Neural Network

Neural Processing Letters
Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them, and in an elegant way.