Hybrid statistical pronunciation models designed to be trained by a medium-size corpus

Authors:
Bahram Vazirnezhad;Farshad Almasganj;Seyed Mohammad Ahadi
Affiliations:
Biomedical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Hafez Avenue, P.O. Box 15875-4413, Tehran, Iran;Biomedical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Hafez Avenue, P.O. Box 15875-4413, Tehran, Iran;Electrical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Hafez Avenue, P.O. Box 15875-4413, Tehran, Iran
Venue:
Computer Speech and Language
Year:
2009

Citing 10
Cited 1

Automatic generation of multiple pronunciations based on neural networks

Speech Communication
In search of better pronunciation models for speech recognition

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Effects of speaking rate and word frequency on pronunciations in conversational speech

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Stochastic K-means algorithm for vector quantization

Pattern Recognition Letters
Algorithms and Theory of Computation Handbook

Algorithms and Theory of Computation Handbook
Dynamic pronunciation models for automatic speech recognition

Dynamic pronunciation models for automatic speech recognition
A statistical model for generating pronunciation networks

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
The effect of language model probability on pronunciation reduction

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Spoken language annotation and data-driven modelling of phone-level pronunciation in discourse context

Speech Communication

Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generating pronunciation variants of words is an important subject in speech research and is used extensively in automatic speech recognition and segmentation systems. Decision trees are well known tools in modeling pronunciation over words or sub-word units. In the case of word units and very large vocabulary, in order to train necessary decision trees, a huge amount of speech utterances are required. This training data must contain all of the needed words in the vocabulary with a sufficient number of repetitions for each one. Additionally, an extra corpus is needed for every word which is not included in the original training corpus and may be added to the vocabulary in the future. To overcome these drawbacks, we have designed generalized decision trees, which can be trained using a medium-size corpus over groups of similar words to share information on pronunciation, instead of training a separate tree for every single word. Generalized decision trees predict places in the word where substitution, deletion and insertion of phonemes may occur. After this step, appropriate statistical contextual rules are applied to the permitted places, in order to specifically determine word variants. The hybrids of generalized decision trees and contextual rules are designed in static and dynamic versions. The hybrid static pronunciation models take into account word phonological structures, unigram probabilities, stress and phone context information simultaneously, while the hybrid dynamic models consider an extra feature, speaking rate, to generate pronunciation variants of words. Using the word variants, generated by static and dynamic models, in the lexicon of the SHENAVA Persian continuous speech recognizer, relative word error rate reductions as high as 8.1% and 11.6% are obtained, respectively.