A data-driven method for modeling pronunciation variation

Authors:
Judith M. Kessens;Catia Cucchiarini;Helmer Strik
Affiliations:
A2 RT, Department of Language and Speech, University of Nijmegen, P.O. Box 9103, 5600 HD Nijmegen, The Netherlands;A2 RT, Department of Language and Speech, University of Nijmegen, P.O. Box 9103, 5600 HD Nijmegen, The Netherlands;A2 RT, Department of Language and Speech, University of Nijmegen, P.O. Box 9103, 5600 HD Nijmegen, The Netherlands
Venue:
Speech Communication
Year:
2003

Citing 7
Cited 4

Automatic generation of multiple pronunciations based on neural networks

Speech Communication
In search of better pronunciation models for speech recognition

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Maximum likelihood modelling of pronunciation variation

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Stochastic pronunciation modelling from hand-labelled phonetic corpora

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Dynamic pronunciation models for automatic speech recognition

Dynamic pronunciation models for automatic speech recognition

Automatic phonetic transcription of large speech corpora

Computer Speech and Language
On the utility of syllable-based acoustic models for pronunciation variation modelling

EURASIP Journal on Audio, Speech, and Music Processing
Modelling pronunciation variation with single-path and multi-path syllable models: Issues to consider

Speech Communication
Multiword expressions in spoken language: An exploratory study on pronunciation variation

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a rule-based data-driven (DD) method to model pronunciation variation in automatic speech recognition (ASR). The DD method consists of the following steps. First, the possible pronunciation variants are generated by making each phone in the canonical transcription of the word optional. Next, forced recognition is performed in order to determine which variant best matches the acoustic signal. Finally, the rules are derived by aligning the best matching variant with the canonical transcription of the variant. Error analysis is performed in order to gain insight into the process of pronunciation modeling. This analysis shows that although modeling pronunciation variation brings about improvements, deteriorations are also introduced. A strong correlation is found between the number of improvements and deteriorations per rule. This result indicates that it is not possible to improve ASR performance by excluding the rules that cause deteriorations, because these rules also produce a considerable number of improvements. Finally, we compare three different criteria for rule selection. This comparison indicates that the absolute frequency of rule application (Fabs) is the most suitable criterion for rule selection. For the best testing condition, a statistically significant reduction in word error rate (WER) of 1.4% absolutely, or 8% relatively, is found.