Diffusion of context and credit information in Markovian models

Authors:
Yoshua Bengio;Paolo Frasconi
Affiliations:
Dept. I.R.O., Université de Montréal, Montreal, QC, Canada;Dip. di Sistemi e Informatica, Università di Firenze, Firenze, Italy
Venue:
Journal of Artificial Intelligence Research
Year:
1995

Citing 8
Cited 6

Alpha-nets: a recurrent “neural” network architecture with a hidden Markov model interpretation

Speech Communication - Neurospeech
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
The time dimension of neural network models

ACM SIGART Bulletin
Introduction to matrix analysis (2nd ed.)

Introduction to matrix analysis (2nd ed.)
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks

IEEE Transactions on Knowledge and Data Engineering
Hidden Markov Model} Induction by Bayesian Model Merging

Advances in Neural Information Processing Systems 5, [NIPS Conference]
A learning algorithm for continually running fully recurrent neural networks

Neural Computation

A Discrete Probabilistic Memory Model for Discovering Dependencies in Time

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Estimating the Pen Trajectories of Static Signatures Using Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Structural hidden Markov models: An application to handwritten numeral recognition

Intelligent Data Analysis
Language structure using fuzzy similarity

IEEE Transactions on Fuzzy Systems
Clifford support vector machines for classification, regression, and recurrence

IEEE Transactions on Neural Networks
Inducing hidden Markov models to model long-term dependencies

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.