Using a randomised controlled clinical trial to evaluate an NLG system

Authors:
Ehud Reiter;Roma Robertson;A. Scott Lennox;Liesl Osman
Affiliations:
University of Aberdeen, Aberdeen, Scotland, UK;University of Aberdeen, Aberdeen, Scotland, UK;University of Aberdeen, Aberdeen, Scotland, UK;University of Aberdeen, Aberdeen, Scotland, UK
Venue:
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Year:
2001

Citing 7
Cited 6

Using Grice's maxim of quantity to select the content of plan descriptions

Artificial Intelligence
Pipelines and size constraints

Computational Linguistics
Developing and empirically evaluating robust explanation generators: the KNIGHT experiments

Computational Linguistics
An empirical study on the generation of anaphora in Chinese

Computational Linguistics
Evaluating and comparing three text-production techniques

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
An empirical study of the influence of argument conciseness on argument effectiveness

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Knowledge acquisition for natural language generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14

Personalized and adaptive systems for medical consumer applications

Communications of the ACM - The Adaptive Web
Automated knowledge acquisition for instructional text generation

Proceedings of the 20th annual international conference on Computer documentation
Lessons from a failure: generating tailored smoking cessation letters

Artificial Intelligence
Evaluations of NLG systems: common corpus and tasks or common dimensions and metrics?

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Discourse planning for information composition and delivery: A reusable platform

Natural Language Engineering
SemScribe: automatic generation of medical reports

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health

Quantified Score

Hi-index	0.00

Visualization

Abstract

The STOP system, which generates personalised smoking-cessation letters, was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere, in the medical literature. In this paper we discuss the clinical trial itself: its structure and cost, what we did and did not learn from it (especially considering that the trial showed that STOP was not effective), and how it compares to other NLG evaluation techniques.