A statistical model for generating pronunciation networks

Authors:
M. D. Riley
Affiliations:
AT& T Bell Labs., Murray Hill, NJ, USA
Venue:
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Year:
1991

Citing 0
Cited 3

Adaptive Mixtures of Probabilistic Transducers

Neural Computation
Hybrid statistical pronunciation models designed to be trained by a medium-size corpus

Computer Speech and Language
Extracting phoneme pronunciation information from corpora

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Methods to predict detailed phonetic pronunciations from a coarse phonemic transcription are described. The phonemic base forms, obtainable from orthographic text by dictionary lookup and other means, do not specify fine phonetic detail such as flapping, glottal stop insertion, or the formation of syllabic nasals and liquids. These phenomena depend on the phonetic context (often spanning word boundaries), stress environment, speaking rate, and dialect. A procedure is presented that builds decision trees, trained on the TIMIT database, using some of these features to predict pronunciation alternatives. The resulting phonetic network predicts the correct pronunciation of a phoneme on test data from the same corpus approximately 83% of the time and the correct phone was in the top five guesses 99% of the time.