The Lincoln tied-mixture HMM continuous speech recognizer

Authors:
D. B. Paul
Affiliations:
MIT Lincoln Lab., Lexington, MA, USA
Venue:
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Year:
1991

Citing 0
Cited 9

Linear discriminant analysis for improved large vocabulary continuous speech recognition

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Discriminative analysis for feature reduction in automatic speech recognition

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Phoneme HMMs constrained by frame correlations

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Allophone modeling for vocabulary-independent HMM recognition

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
The Lincoln large-vocabulary stack-decoder HMM CSR

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Vector quantization for the efficient computation of continuous density likelihoods

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Stochastic approximation learning for mixtures of multivariate elliptical distributions

Neurocomputing
Handling signal variability with contextual markovian models

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Lincoln robust HMM (hidden Markov model) recognizer has been converted from a single Gaussian or Gaussian mixture PDF per state to tied mixtures in which a single set of Gaussians is shared between all states. There were initial difficulties caused by the use of mixture pruning but these were cured by using observation pruning. Fixed weight smoothing of the mixture weights allowed the use of word-boundary-context-dependent triphone models for both speaker-dependent (SD) and speaker-independent (SI) recognition. A second-differential observation stream further improved SI performance but not SD performance. A novel form of phonetic context model, the semiphone, is also introduced. This model significantly reduces the number of states required to model a vocabulary and unifies triphone and diphone modeling.