Using a randomised controlled clinical trial to evaluate an NLG system
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Towards measuring the cost of changing adaptive hypermedia systems
AH'06 Proceedings of the 4th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems
A hearer-oriented evaluation of referring expression generation
ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Discourse planning for information composition and delivery: A reusable platform
Natural Language Engineering
Hi-index | 0.00 |
In this position paper, we argue that a common task and corpus are not the only ways to evaluate Natural Language Generation (NLG) systems. It might be, in fact, too narrow a view on evaluation and thus not be the best way to evaluate these systems. The aim of a common task and corpus is to allow for a comparative evaluation of systems, looking at the systems' performances. It is thus a "system-oriented" view of evaluation. We argue here that, if we are to take a system oriented view of evaluation, the community might be better served by enlarging the view of evaluation, defining common dimensions and metrics to evaluate systems and approaches. We also argue that end-user (or usability) evaluations form another important aspect of a system's evaluation and should not be forgotten.