Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets

Authors:
Juan Antonio Pérez-Ortiz;Felix A. Gers;Douglas Eck;Jürgen Schmidhuber
Affiliations:
Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, E-03071 Alacant, Spain;Mantik Bioinformatik GmbH, Neue Gruenstrasse 18, 10179 Berlin, Germany;IDSIA, Galleria 2, 6928 Manno, Switzerland;IDSIA, Galleria 2, 6928 Manno, Switzerland
Venue:
Neural Networks
Year:
2003

Citing 12
Cited 7

An efficient gradient-based algorithm for on-line training of recurrent network trajectories

Neural Computation
Learning sequential structure with the real-time recurrent learning algorithm

International Journal of Neural Systems
A fixed size storage O(n3) time complexity learning algorithm for fully recurrent continually running networks

Neural Computation
The neural network pushdown automation: model, stack and learning simulations

The neural network pushdown automation: model, stack and learning simulations
Gradient-based learning algorithms for recurrent networks and their computational complexity

Backpropagation
Long short-term memory

Neural Computation
Recurrent neural networks can learn to implement symbol-sensitive counting

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Kalman Filtering and Neural Networks

Kalman Filtering and Neural Networks
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Learning to Forget: Continual Prediction with LSTM

Neural Computation
LSTM recurrent networks learn simple context-free and context-sensitive languages

IEEE Transactions on Neural Networks
Gradient calculations for dynamic recurrent neural networks: a survey

IEEE Transactions on Neural Networks

Organization of the state space of a simple recurrent network before and after training on recursive linguistic structures

Neural Networks
Training Recurrent Networks by Evolino

Neural Computation
Kalman filtering for neural prediction of response spectra from mining tremors

Computers and Structures
Improving the Learning Speed in 2-Layered LSTM Network by Estimating the Configuration of Hidden Units and Optimizing Weights Initialization

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
A generalized LSTM-like training algorithm for second-order recurrent neural networks

Neural Networks
Error entropy minimization for LSTM training

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
A multitask approach to continuous five-dimensional affect sensing in natural speech

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments

Quantified Score

Hi-index	0.01

Visualization

Abstract

The long short-term memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. We have recently observed that the decoupled extended Kalman filter training algorithm allows for even better performance, reducing significantly the number of training steps when compared to the original gradient descent training algorithm. In this paper we present a set of experiments which are unsolvable by classical recurrent networks but which are solved elegantly and robustly and quickly by LSTM combined with Kalman filters.