Modelling pronunciation variation with single-path and multi-path syllable models: Issues to consider

Authors:
Annika Hämäläinen;Louis ten Bosch;Lou Boves
Affiliations:
Centre for Language and Speech Technology (CLST), Faculty of Arts, Radboud University Nijmegen, P.O. Box 9103, 6500 HD Nijmegen, The Netherlands;Centre for Language and Speech Technology (CLST), Faculty of Arts, Radboud University Nijmegen, P.O. Box 9103, 6500 HD Nijmegen, The Netherlands;Centre for Language and Speech Technology (CLST), Faculty of Arts, Radboud University Nijmegen, P.O. Box 9103, 6500 HD Nijmegen, The Netherlands
Venue:
Speech Communication
Year:
2009

Citing 7
Cited 0

Towards increasing speech recognition error rates

Speech Communication
Speaking in shorthand — a syllable-centric perspective for understanding pronunciation variation

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
A data-driven method for modeling pronunciation variation

Speech Communication
Trajectory Clustering for Solving the Trajectory Folding Problem in Automatic Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Structured speech modeling

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we construct context-independent single-path and multi-path syllable models aimed at improved pronunciation variation modelling. We use phonetic transcriptions to define the topologies of the syllable models and to initialise the model parameters, and the Baum-Welch algorithm for the re-estimation of the model parameters. We hypothesise that the richer topology of multi-path syllable models would be better at accounting for pronunciation variation than context-dependent phone models that can only account for the effects of the left and right neighbours, or single-path syllable models whose power of modelling segmental variation would seem to be limited. However, both context-dependent phone models and single-path syllable models outperform multi-path syllable models on a large-vocabulary continuous speech recognition task. Careful analyses of the errors made by the recognisers with single-path and multi-path syllable models show that the most important factors affecting the speech recognition performance are syllable context and lexical confusability. In addition, the speech recognition results suggest that the benefits of the greater acoustic modelling accuracy of the multi-path syllable models can only be reaped if the information about the syllable-level pronunciation variation can be linked with the word-level information in the language model.