Dual stream speech recognition using articulatory syllable models

  • Authors:
  • Antti Puurula;Dirk Compernolle

  • Affiliations:
  • ESAT, Katholieke Universiteit Leuven, Leuven, Belgium 3001;ESAT, Katholieke Universiteit Leuven, Leuven, Belgium 3001

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent theoretical developments in neuroscience suggest that sublexical speech processing occurs via two parallel processing pathways. According to this Dual Stream Model of Speech Processing speech is processed both as sequences of speech sounds and articulations. We attempt to revise the "beads-on-a-string" paradigm of Hidden Markov Models in Automatic Speech Recognition (ASR) by implementing a system for dual stream speech recognition. A baseline recognition system is enhanced by modeling of articulations as sequences of syllables. An efficient and complementary model to HMMs is developed by formulating Dynamic Time Warping (DTW) as a probabilistic model. The DTW Model (DTWM) is improved by enriching syllable templates with constrained covariance matrices, data imputation, clustering and mixture modeling. The resulting dual stream system is evaluated on the N-Best Southern Dutch Broadcast News benchmark. Promising results are obtained for DTWM classification and ASR tests. We provide a discussion on the remaining problems in implementing dual stream speech recognition.