Speech Recognition Using Linear Dynamic Models

Authors:
Joe Frankel;Simon King
Affiliations:
Centre for Speech Technol. Res., Edinburgh Univ.;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 4

Factoring Gaussian precision matrices for linear dynamic models

Pattern Recognition Letters
Review: Statistical parametric speech synthesis

Speech Communication
Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

International Journal of Speech Technology
Continuous speech recognition using linear dynamic models

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The majority of automatic speech recognition systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results