Structured speech modeling

Authors:
Li Deng;Dong Yu;A. Acero
Affiliations:
Microsoft Res., Redmond, WA;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 5

Modelling pronunciation variation with single-path and multi-path syllable models: Issues to consider

Speech Communication
Review: Statistical parametric speech synthesis

Speech Communication
Using continuous features in the maximum entropy model

Pattern Recognition Letters
A novel framework and training algorithm for variable-parameter hidden Markov models

IEEE Transactions on Audio, Speech, and Language Processing
Rethinking of computation for future-generation, knowledge-rich speech recognition and understanding

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words , the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class