A Study on Prosody and Discourse Structure in Cooperative Dialogues

  • Authors:
  • Shin Nakajima;James F. Allen

  • Affiliations:
  • -;-

  • Venue:
  • A Study on Prosody and Discourse Structure in Cooperative Dialogues
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes how well prosodic information correlates with the topic structure of a cooperative dialogue. To investigate this correlation systematically, first we introduce the notion of utterance unit (UU) as a basic unit in conversations. We define the utterance unit by employing four principles. The grammatical principle is a syntactic criterion in which the UU boundary is set wherever the period can be placed. The pragmatic principle says that each UU corresponds to a basic speech act. In other words, if two neighboring phrases correspond to different speech acts (for instance, acknowledgment and request), they should be taken as two different UUs. The conversational principle addresses the turn-taking aspect of conversations. A UU boundary should be placed wherever the speaker changes. Finally, the prosodic principle says that whenever a medium length or longer pause (750 msec) is inserted between two phrases, they are to be taken as two different UUs. We apply these principles to a speech database containing about one and a half hours of collected dialogue to split the dialogues into a sequence of UUs. We then classify the inter-UU boundaries based on the relationship between two neighboring UUs into four semantic categories: topic shift, topic continuation, elaboration (or clarification), and speech-act continuation. The prosodic parameters measured at each boundary are the onset fundamental frequency (F0), the final F0, and the F0 maximal peak declination ratio (the ratio of the current UUUs maximal peak to that of the preceding UU). Our study shows how these prosodic parameters vary depending on the topic structure. Our results can be summarized as follows. (1) The onset F0 value tends to be higher when the topic is changed at the UU boundary. (2) The final F0 value indicates finality and is much higher (on average) at speech-act continuation boundaries than at other boundaries. (3) The maximal peak declination ratio reflects the degree of subordination to the preceding UU. That is, this ratio is lowest at elaboration boundaries and highest at topic shift boundaries. Finally, we discuss discourse structure identification via the prosodic parameters.