Sequence-based pronunciation modeling using a noisy-channel approach

Authors:
Hansjörg Hofmann;Sakriani Sakti;Ryosuke Isotani;Hisashi Kawai;Satoshi Nakamura;Wolfgang Minker
Affiliations:
National Institute of Information and Communications Technology, Japan and University of Ulm, Germany;National Institute of Information and Communications Technology, Japan;National Institute of Information and Communications Technology, Japan;National Institute of Information and Communications Technology, Japan;National Institute of Information and Communications Technology, Japan;University of Ulm, Germany
Venue:
IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Year:
2010

Citing 8
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Distortion models for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An efficient event ordering algorithm that extends the lifetime of wireless actor and sensor networks

Performance Evaluation
Probabilistic Pronunciation Variation Model Based on Bayesian Network for Conversational Speech Recognition

ISUC '08 Proceedings of the 2008 Second International Symposium on Universal Communication
Feature-based pronunciation modeling for speech recognition

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous approaches to spontaneous speech recognition address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to model the sequence-based pronunciation variation using a noisy-channel approach where the spontaneous phoneme sequence is considered as a "noisy" string and the goal is to recover the "clean" string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy-channel approach will map from the phoneme to the word level. Our experiments use Switchboard as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The best system achieves up to 38.9% relative improvement to the baseline speech recognition.