Phonological parsing for bi-directional letter-to-sound/sound-to-letter generation

Authors:
Helen M. Meng;Stephanie Seneff;Victor W. Zue
Affiliations:
Massachusetts Institute of Technology, Cambridge, Massachusetts;Massachusetts Institute of Technology, Cambridge, Massachusetts;Massachusetts Institute of Technology, Cambridge, Massachusetts
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 3
Cited 1

From text to speech: the MITalk system

From text to speech: the MITalk system
TINA: a natural language system for spoken language applications

Computational Linguistics
Automatic new word acquisition: spelling from acoustics

HLT '89 Proceedings of the workshop on Speech and Natural Language

Joint-sequence models for grapheme-to-phoneme conversion

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which combines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as morphology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and testing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6% and 5% respectively. Of the remaining words our system achieved a word accuracy of 71.8% and a phoneme accuracy of 92.5% for letter-to-sound generation, and a word accuracy of 55.8% and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierarchical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious description for English orthographic-phonological regularities, while simultaneously attaining competitive generation accuracy.