A parser for real-time speech synthesis of conversational texts

  • Authors:
  • Joan Bachenko;Jeffrey Daugherty;Eileen Fitzpatrick

  • Affiliations:
  • AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ

  • Venue:
  • ANLC '92 Proceedings of the third conference on Applied natural language processing
  • Year:
  • 1992
  • Automated authoring of hypermedia documents of video programs

    Proceedings of the third ACM international conference on Multimedia

  • Capitalization Recovery for Text

    Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, deaf, and hard of hearing people. The application is unusual because it requires real-time synthesis of unedited, spontaneously generated conversational texts transmitted via a Telecommunications Device for the Deaf (TDD). We describe a parser that we have implemented as a front end for a version of the Bell Laboratories text-to-speech synthesizer (Olive and Liberman 1985). The parser prepares TDD texts for synthesis by (a) performing lexical regularization of abbreviations and some non-standard forms, and (b) identifying prosodic phrase boundaries. Rules for identifying phrase boundaries are derived from the prosodic phrase grammar described in Bachenko and Fitzpatrick (1990). Following the parent analysis, these rules use a mix of syntactic and phonological factors to identify phrase boundaries but, unlike the parent system, they forgo building any hierarchical structure in order to bypass the need for a stacking mechanism; this permits the system to operate in near real time. As a component of the text-to-speech system, the parser has undergone rigorous testing during a successful three-month field trial at an AT&T telecommunications center in California. In addition, laboratory evaluations indicate that the parser's performance compares favorably with human judgments about phrasing.