Dual stream speech recognition using articulatory syllable models

Authors:
Antti Puurula;Dirk Compernolle
Affiliations:
ESAT, Katholieke Universiteit Leuven, Leuven, Belgium 3001;ESAT, Katholieke Universiteit Leuven, Leuven, Belgium 3001
Venue:
International Journal of Speech Technology
Year:
2010

Citing 6
Cited 2

Learning mixture models using a genetic version of the EM algorithm

Pattern Recognition Letters
Data Mining: Opportunities and Challenges

Data Mining: Opportunities and Challenges
Articulatory features for robust visual speech recognition

Proceedings of the 6th international conference on Multimodal interfaces
Genetic-Based EM Algorithm for Learning Gaussian Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Representational bias in unsupervised learning of syllable structure

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Template-Based Continuous Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Efficient MLP constructive training algorithm using a neuron recruiting approach for isolated word recognition system

International Journal of Speech Technology
Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II)

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent theoretical developments in neuroscience suggest that sublexical speech processing occurs via two parallel processing pathways. According to this Dual Stream Model of Speech Processing speech is processed both as sequences of speech sounds and articulations. We attempt to revise the "beads-on-a-string" paradigm of Hidden Markov Models in Automatic Speech Recognition (ASR) by implementing a system for dual stream speech recognition. A baseline recognition system is enhanced by modeling of articulations as sequences of syllables. An efficient and complementary model to HMMs is developed by formulating Dynamic Time Warping (DTW) as a probabilistic model. The DTW Model (DTWM) is improved by enriching syllable templates with constrained covariance matrices, data imputation, clustering and mixture modeling. The resulting dual stream system is evaluated on the N-Best Southern Dutch Broadcast News benchmark. Promising results are obtained for DTWM classification and ASR tests. We provide a discussion on the remaining problems in implementing dual stream speech recognition.