Continuous speech recognition using linear dynamic models

  • Authors:
  • Tao Ma;Sundararajan Srinivasan;Georgios Lazarou;Joseph Picone

  • Affiliations:
  • Siri at Apple Inc, Cupertino, USA 95014;Nuance Communications Inc., Sunnyvale, USA 94085;The New York City Transit Authority, New York, USA 11103;Department of Electrical and Computer Engineering, Temple University, Philadelphia, USA 19027

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition.