A parser for real-time speech synthesis of conversational texts

Authors:
Joan Bachenko;Jeffrey Daugherty;Eileen Fitzpatrick
Affiliations:
AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ
Venue:
ANLC '92 Proceedings of the third conference on Applied natural language processing
Year:
1992

Citing 5
Cited 2

A computational grammar of discourse-neutral prosodic phrasing in English

Computational Linguistics
Spelling correction for the telecommunications network for the deaf

Communications of the ACM
Theory of Syntactic Recognition for Natural Languages

Theory of Syntactic Recognition for Natural Languages
The MULTIVOC text-to-speech system

ANLC '88 Proceedings of the second conference on Applied natural language processing
Parsing with a small dictionary for applications such as text to speech

Computational Linguistics

Automated authoring of hypermedia documents of video programs

Proceedings of the third ACM international conference on Multimedia
Capitalization Recovery for Text

Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, deaf, and hard of hearing people. The application is unusual because it requires real-time synthesis of unedited, spontaneously generated conversational texts transmitted via a Telecommunications Device for the Deaf (TDD). We describe a parser that we have implemented as a front end for a version of the Bell Laboratories text-to-speech synthesizer (Olive and Liberman 1985). The parser prepares TDD texts for synthesis by (a) performing lexical regularization of abbreviations and some non-standard forms, and (b) identifying prosodic phrase boundaries. Rules for identifying phrase boundaries are derived from the prosodic phrase grammar described in Bachenko and Fitzpatrick (1990). Following the parent analysis, these rules use a mix of syntactic and phonological factors to identify phrase boundaries but, unlike the parent system, they forgo building any hierarchical structure in order to bypass the need for a stacking mechanism; this permits the system to operate in near real time. As a component of the text-to-speech system, the parser has undergone rigorous testing during a successful three-month field trial at an AT&T telecommunications center in California. In addition, laboratory evaluations indicate that the parser's performance compares favorably with human judgments about phrasing.