Buried Markov models for speech recognition

Authors:
J. A. Bilmes
Affiliations:
Int. Comput. Sci. Inst., Berkeley, CA, USA
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Year:
1999

Citing 0
Cited 7

Discriminative semi-parametric trajectory model for speech recognition

Computer Speech and Language
Speech recognition using augmented conditional random fields

IEEE Transactions on Audio, Speech, and Language Processing
A new classifier for facial expression recognition: fuzzy buried Markov model

Journal of Computer Science and Technology
Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

International Journal of Speech Technology
Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II)

International Journal of Speech Technology
Dynamic Bayesian multinets

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Temporally Varying Weight Regression: A Semi-Parametric Trajectory Model for Automatic Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Good HMM-based speech recognition performance requires at most minimal inaccuracies to be introduced by HMM conditional independence assumptions. In this work, HMM conditional independence assumptions are relaxed in a principled way. For each hidden state value, additional dependencies are added between observation elements to increase both accuracy and discriminability. These additional dependencies are chosen according to natural statistical dependencies extant in training data that are not well modeled by an HMM. The result is called a buried Markov model (BMM) because the underlying Markov chain in an HMM is further hidden (buried) by specific cross-observation dependencies. Gaussian mixture HMMs are extended to represent BMM dependencies and new EM update equations are derived. On preliminary experiments with a large-vocabulary isolated-word speech database, BMMs are able to achieve an 11% improvement in WER with only a 9.5% increase in the number of parameters using a single state per mono-phone speech recognition system.