Evaluating discourse processing algorithms

  • Authors:
  • Marilyn A. Walker

  • Affiliations:
  • Hewlett Packard Laboratories, Bristol, England, U.K.

  • Venue:
  • ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to take steps towards establishing a methodology for evaluating Natural Language systems, we conducted a case study. We attempt to evaluate two different approaches to anaphoric processing in discourse by comparing the accuracy and coverage of two published algorithms for finding the co-specifiers of pronouns in naturally occurring texts and dialogues. We present the quantitative results of hand-simulating these algorithms, but this analysis naturally gives rise to both a qualititive evaluation and recommendations for performing such evaluations in general. We illustrate the general difficulties encountered with quantitative evaluation. These are problems with: (a) allowing for underlying assumptions, (b) determining how to handle underspecifications, and (c) evaluating the contribution of false positives and error chaining.