Towards using prosody in speech recognition/understanding systems: differences between read and spontaneous speech

  • Authors:
  • Kim E. A. Silverman;Eleonora Blaauw;Judith Spitz;John F. Pitrelli

  • Affiliations:
  • NYNEX Science and Technology, White Plains, NY;NYNEX Science and Technology, White Plains, NY;NYNEX Science and Technology, White Plains, NY;NYNEX Science and Technology, White Plains, NY

  • Venue:
  • HLT '91 Proceedings of the workshop on Speech and Natural Language
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

A persistent problem for keyword-driven speech recognition systems is that users often embed the to-be-recognized words or phrases in longer utterances. The recognizer needs to locate the relevant sections of the speech signal and ignore extraneous words. Prosody might provide an extra source of information to help locate target words embedded in other speech. In this paper we examine some prosodic characteristics of 160 such utterances and compare matched read and spontaneous versions. Half of the utterances are from a corpus of spontaneous answers to requests for the name of a city, recorded from calls to Directory Assistance Operators. The other half are the same word strings read by volunteers attempting to model the real dialogue. Results show a consistent pattern across both sets of data: embedded city names almost always bear nuclear pitch accents and are in their own intonational phrases. However the distributions of tonal make-up of these prosodic features differ markedly in read versus spontaneous speech, implying that if algorithms that exploit these prosodic regularities are trained on read speech, then the probabilities are likely to be incorrect models of real user speech.