Evaluations of NLG systems: common corpus and tasks or common dimensions and metrics?

  • Authors:
  • Cécile Paris;Nathalie Colineau;Ross Wilkinson

  • Affiliations:
  • CSIRO ICT Centre, North Ryde, NSW, Australia;CSIRO ICT Centre, North Ryde, NSW, Australia;CSIRO ICT Centre, North Ryde, NSW, Australia

  • Venue:
  • INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this position paper, we argue that a common task and corpus are not the only ways to evaluate Natural Language Generation (NLG) systems. It might be, in fact, too narrow a view on evaluation and thus not be the best way to evaluate these systems. The aim of a common task and corpus is to allow for a comparative evaluation of systems, looking at the systems' performances. It is thus a "system-oriented" view of evaluation. We argue here that, if we are to take a system oriented view of evaluation, the community might be better served by enlarging the view of evaluation, defining common dimensions and metrics to evaluate systems and approaches. We also argue that end-user (or usability) evaluations form another important aspect of a system's evaluation and should not be forgotten.