Joint-sequence models for grapheme-to-phoneme conversion

Authors:
Maximilian Bisani;Hermann Ney
Affiliations:
Lehrstuhl für Informatik VI, RWTH Aachen University, Ahornstraíe 55, D-52056 Aachen, Germany;Lehrstuhl für Informatik VI, RWTH Aachen University, Ahornstraíe 55, D-52056 Aachen, Germany
Venue:
Speech Communication
Year:
2008

Citing 5
Cited 29

Inference of variable-length linguistic and acoustic units by multigrams

Speech Communication
On the Estimation of 'Small' Probabilities by Leaving-One-Out

IEEE Transactions on Pattern Analysis and Machine Intelligence
A systematic comparison of various statistical alignment models

Computational Linguistics
A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
Phonological parsing for bi-directional letter-to-sound/sound-to-letter generation

HLT '94 Proceedings of the workshop on Human Language Technology

A data-driven grapheme-to-phoneme conversion method using dynamic contextual converting rules for Korean TTS systems

Computer Speech and Language
Extending pronunciation lexicons via non-phonemic respellings

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
A deep learning approach to machine transliteration

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Validação de corpus para reconhecimento de fala contínua em Português Brasileiro

Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
Comparative analysis of transliteration techniques based on statistical machine translation and joint-sequence model

Proceedings of the 2010 Symposium on Information and Communication Technology
Integrating joint n-gram features into a discriminative training framework

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
An MDL-based approach to extracting subword units for grapheme-to-phoneme conversion

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Letter-phoneme alignment: an exploration

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using Dependency Grammar Features in Whole Sentence Maximum Entropy Language Model for Speech Recognition

Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Transliteration using a phrase-based statistical machine translation system to re-score the output of a joint multigram model

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Recovery of rare words in lecture speech

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Integration of statistical models for dictation of document translations in a machine-aided human translation task

IEEE Transactions on Audio, Speech, and Language Processing
Online spelling correction for query completion

Proceedings of the 20th international conference on World wide web
Predicting word pronunciation in Japanese

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
How do you pronounce your name?: improving G2P with transliterations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An algorithm for unsupervised transliteration mining with an application to word alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Improving proper name recognition by means of automatically learned pronunciation variants

Speech Communication
A baseline system for continuous speech recognition of Brazilian Prtuguese using the west point Brazilian Portuguese speech corpus

PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
G2P conversion of proper names using word origin information

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Leveraging supplemental representations for sequential transduction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A statistical model for unsupervised and semi-supervised transliteration mining

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Unsupervised vocabulary adaptation for morph-based language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Applying prediction techniques to phoneme-based AAC systems

SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
A probabilistic approach to pronunciation by analogy

Computer Speech and Language
Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics

Information Sciences: an International Journal
Web-based tools and methods for rapid pronunciation dictionary creation

Speech Communication
SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

Speech Communication
An approach for efficient open vocabulary spoken term detection

Speech Communication
Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grapheme-to-phoneme conversion is the task of finding the pronunciation of a word given its written form. It has important applications in text-to-speech and speech recognition. Joint-sequence models are a simple and theoretically stringent probabilistic framework that is applicable to this problem. This article provides a self-contained and detailed description of this method. We present a novel estimation algorithm and demonstrate high accuracy on a variety of databases. Moreover, we study the impact of the maximum approximation in training and transcription, the interaction of model size parameters, n-best list generation, confidence measures, and phoneme-to-grapheme conversion. Our software implementation of the method proposed in this work is available under an Open Source license.