Statistical transformation of language and pronunciation models for spontaneous speech recognition

Authors:
Yuya Akita;Tatsuya Kawahara
Affiliations:
Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan;Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 9
Cited 1

A maximum entropy approach to natural language processing

Computational Linguistics
Automatic generation of multiple pronunciations based on neural networks

Speech Communication
Stochastic pronunciation modelling from hand-labelled phonetic corpora

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Automatic Alternative Transcription Generation and Vocabulary Selection for Flexible Word Recognizers

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Extended models and tools for high-performance part-of-speech tagger

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A parametric approach to vocal tract length normalization

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Speaker normalization using efficient frequency warping procedures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
The 1998 HTK system for transcription of conversational telephone speech

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

A monotonic statistical machine translation approach to speaking style transformation

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel approach based on a statistical transformation framework for language and pronunciation modeling of spontaneous speech. Since it is not practical to train a spoken-style model using numerous spoken transcripts, the proposed approach generates a spoken-style model by transforming an orthographic model trained with document archives such as the minutes of meetings and the proceedings of lectures. The transformation is based on a statistical model estimated using a small amount of a parallel corpus, which consists of faithful transcripts aligned with their orthographic documents. Patterns of transformation, such as substitution, deletion, and insertion of words, are extracted with their word and part-of-speech (POS) contexts, and transformation probabilities are estimated based on occurrence statistics in a parallel aligned corpus. For pronunciation modeling, subword-based mapping between baseforms and surface forms is extracted with their occurrence counts, then a set of rewrite rules with their probabilities are derived as a transformation model. Spoken-style language and pronunciation (surface forms) models can be predicted by applying these transformation patterns to a document-style language model and baseforms in a lexicon, respectively. The transformed models significantly reduced perplexity and word error rates (WERs) in a task of transcribing congressional meetings, even though the domains and topics were different from the parallel corpus. This result demonstrates the generality and portability of the proposed framework.