Alpha-nets: a recurrent “neural” network architecture with a hidden Markov model interpretation
Speech Communication - Neurospeech
Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
The time dimension of neural network models
ACM SIGART Bulletin
Introduction to matrix analysis (2nd ed.)
Introduction to matrix analysis (2nd ed.)
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks
IEEE Transactions on Knowledge and Data Engineering
Hidden Markov Model} Induction by Bayesian Model Merging
Advances in Neural Information Processing Systems 5, [NIPS Conference]
A learning algorithm for continually running fully recurrent neural networks
Neural Computation
A Discrete Probabilistic Memory Model for Discovering Dependencies in Time
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Estimating the Pen Trajectories of Static Signatures Using Hidden Markov Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Structural hidden Markov models: An application to handwritten numeral recognition
Intelligent Data Analysis
Language structure using fuzzy similarity
IEEE Transactions on Fuzzy Systems
Clifford support vector machines for classification, regression, and recurrence
IEEE Transactions on Neural Networks
Inducing hidden Markov models to model long-term dependencies
ECML'05 Proceedings of the 16th European conference on Machine Learning
Hi-index | 0.00 |
This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.