Discourse segmentation by human and automated means

  • Authors:
  • Rebecca J. Passonneau;Diane J. Litman

  • Affiliations:
  • Bellcore and Columbia University;AT&T Labs-Research

  • Venue:
  • Computational Linguistics
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need to model the relation between discourse structure and linguistic features of utterances is almost universally acknowledged in the literature on discourse. However, there is only weak consensus on what the units of discourse structure are, or the criteria for recognizing and generating them. We present quantitative results of a two-part study using a corpus of spontaneous, narrative monologues. The first part of our paper presents a method for empirically validating multitutterance units referred to as discourse segments. We report highly significant results of segmentations performed by naive subjects, where a commonsense notion of speaker intention is the segmentation criterion. In the second part of our study, data abstracted from the subjects' segmentations serve as a target for evaluating two sets of algorithms that use utterance features to perform segmentation. On the first algorithm set, we evaluate and compare the correlation of discourse segmentation with three types of linguistic cues (referential noun phrases, cue words, and pauses). We then develop a second set using two methods: error analysis and machine learning. Testing the new algorithms on a new data set shows that when multiple sources of linguistic knowledge are used concurrently, algorithm performance improves.