Automatic generation of multiple pronunciations based on neural networks
Speech Communication
In search of better pronunciation models for speech recognition
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Effects of speaking rate and word frequency on pronunciations in conversational speech
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Stochastic K-means algorithm for vector quantization
Pattern Recognition Letters
Algorithms and Theory of Computation Handbook
Algorithms and Theory of Computation Handbook
Dynamic pronunciation models for automatic speech recognition
Dynamic pronunciation models for automatic speech recognition
A statistical model for generating pronunciation networks
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
The effect of language model probability on pronunciation reduction
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Hi-index | 0.00 |
Generating pronunciation variants of words is an important subject in speech research and is used extensively in automatic speech recognition and segmentation systems. Decision trees are well known tools in modeling pronunciation over words or sub-word units. In the case of word units and very large vocabulary, in order to train necessary decision trees, a huge amount of speech utterances are required. This training data must contain all of the needed words in the vocabulary with a sufficient number of repetitions for each one. Additionally, an extra corpus is needed for every word which is not included in the original training corpus and may be added to the vocabulary in the future. To overcome these drawbacks, we have designed generalized decision trees, which can be trained using a medium-size corpus over groups of similar words to share information on pronunciation, instead of training a separate tree for every single word. Generalized decision trees predict places in the word where substitution, deletion and insertion of phonemes may occur. After this step, appropriate statistical contextual rules are applied to the permitted places, in order to specifically determine word variants. The hybrids of generalized decision trees and contextual rules are designed in static and dynamic versions. The hybrid static pronunciation models take into account word phonological structures, unigram probabilities, stress and phone context information simultaneously, while the hybrid dynamic models consider an extra feature, speaking rate, to generate pronunciation variants of words. Using the word variants, generated by static and dynamic models, in the lexicon of the SHENAVA Persian continuous speech recognizer, relative word error rate reductions as high as 8.1% and 11.6% are obtained, respectively.