Evaluating evaluation methods for generation in the presence of variation

  • Authors:
  • Amanda Stent;Matthew Marge;Mohit Singhai

  • Affiliations:
  • Stony Brook University, Stony Brook, NY;Stony Brook University, Stony Brook, NY;Stony Brook University, Stony Brook, NY

  • Venue:
  • CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a corpus of automatically generated paraphrases. We show that these evaluation metrics can at least partially measure adequacy (similarity in meaning), but are not good measures of fluency (syntactic correctness). We make several proposals for improving the evaluation of generation systems that produce variation.