Can syllabification improve pronunciation by analogy of English?

Authors:
Yannick Marchand;Robert I. Damper
Affiliations:
Institute for Biodiagnostics (Atlantic), National Research Council Canada, Neuroimaging Research Laboratory, 1796 Summer Street, Suite 3900 Halifax, Nova Scotia, Canada B3H 3A7;Image, Speech and Intelligent Systems (ISIS) Research Group, School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK email rid@ecs.soton.ac.uk
Venue:
Natural Language Engineering
Year:
2007

Citing 11
Cited 6

Instance-Based Learning Algorithms

Machine Learning
Novel-word pronunciation: a cross-language study

Speech Communication - Speech science and technology: a selection from the papers presented at the Fourth International Conference in Speech Science and Technology (SST-92)
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Speaking in shorthand — a syllable-centric perspective for understanding pronunciation variation

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Editorial

Artificial Intelligence Review - Special issue on lazy learning
A multistrategy approach to improving pronunciation by analogy

Computational Linguistics
Finite state methods for hyphenation

Natural Language Engineering
Automatic detection of syllable boundaries combining the advantages of treebank and bracketed corpora training

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Inducing probabilistic syllable classes using multivariate clustering

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Information fusion approaches to the automatic pronunciation of print by analogy

Information Fusion

Multilingual pronunciation by analogy

Natural Language Engineering
Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian

Computer Speech and Language
On the syllabification of phonemes

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A ranking approach to stress prediction for letter-to-phoneme conversion

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Letter-phoneme alignment: an exploration

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
On the use of machine learning and syllable information in european portuguese grapheme-phone conversion

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In spite of difficulty in defining the syllable unequivocally, and controversy over its role in theories of spoken and written language processing, the syllable is a potentially useful unit in several practical tasks which arise in computational linguistics and speech technology. For instance, syllable structure might embody valuable information for building word models in automatic speech recognition, and concatenative speech synthesis might use syllables or demisyllables as basic units. In this paper, we first present an algorithm for determining syllable boundaries in the orthographic form of unknown words that works by analogical reasoning from a database or corpus of known syllabifications. We call this syllabification by analogy (SbA). It is similarly motivated to our existing pronunciation by analogy (PbA) which predicts pronunciations for unknown words (specified by their spellings) by inference from a dictionary of known word spellings and corresponding pronunciations. We show that including perfect (according to the corpus) syllable boundary information in the orthographic input can dramatically improve the performance of pronunciation by analogy of English words, but such information would not be available to a practical system. So we next investigate combining automatically-inferred syllabification and pronunciation in two different ways: the series model in which syllabification is followed sequentially by pronunciation generation; and the parallel model in which syllabification and pronunciation are simultaneously inferred. Unfortunately, neither improves performance over PbA without syllabification. Possible reasons for this failure are explored via an analysis of syllabification and pronunciation errors.