Evaluations of NLG systems: common corpus and tasks or common dimensions and metrics?

Authors:
Cécile Paris;Nathalie Colineau;Ross Wilkinson
Affiliations:
CSIRO ICT Centre, North Ryde, NSW, Australia;CSIRO ICT Centre, North Ryde, NSW, Australia;CSIRO ICT Centre, North Ryde, NSW, Australia
Venue:
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Year:
2006

Citing 2
Cited 2

Using a randomised controlled clinical trial to evaluate an NLG system

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Towards measuring the cost of changing adaptive hypermedia systems

AH'06 Proceedings of the 4th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems

A hearer-oriented evaluation of referring expression generation

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Discourse planning for information composition and delivery: A reusable platform

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this position paper, we argue that a common task and corpus are not the only ways to evaluate Natural Language Generation (NLG) systems. It might be, in fact, too narrow a view on evaluation and thus not be the best way to evaluate these systems. The aim of a common task and corpus is to allow for a comparative evaluation of systems, looking at the systems' performances. It is thus a "system-oriented" view of evaluation. We argue here that, if we are to take a system oriented view of evaluation, the community might be better served by enlarging the view of evaluation, defining common dimensions and metrics to evaluate systems and approaches. We also argue that end-user (or usability) evaluations form another important aspect of a system's evaluation and should not be forgotten.