Sequence-based pronunciation modeling using a noisy-channel approach
IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Hi-index | 0.00 |
This paper reports on an ongoing study on modelingpronunciation variation for conversational speech recognition, in which the mapping from canonical pronunciations (baseforms) to the actual/realized phoneme (surface forms) is modeled by a Bayesian network. The advantage of this graphical model framework is that the probabilistic relationship between baseforms, surface forms, and any additional knowledge sources can be learned in a unified manner. Thus, we can easily incorporate various additional knowledge sources from different domains. In this preliminary study, we investigate the dependency of surface forms on the current, preceding and succeeding baseform phonemes, the position of current baseform phoneme in the word, and also whether or not the preceding surface phoneme was deleted. The performance of the proposed method was evaluated using spontaneous telephone conversations from a portion of the Switchboard corpus. Experimental results show that this method provides consistent improvement in word accuracy over the standard pronunciation dictionary.