Using a randomised controlled clinical trial to evaluate an NLG system

  • Authors:
  • Ehud Reiter;Roma Robertson;A. Scott Lennox;Liesl Osman

  • Affiliations:
  • University of Aberdeen, Aberdeen, Scotland, UK;University of Aberdeen, Aberdeen, Scotland, UK;University of Aberdeen, Aberdeen, Scotland, UK;University of Aberdeen, Aberdeen, Scotland, UK

  • Venue:
  • ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The STOP system, which generates personalised smoking-cessation letters, was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere, in the medical literature. In this paper we discuss the clinical trial itself: its structure and cost, what we did and did not learn from it (especially considering that the trial showed that STOP was not effective), and how it compares to other NLG evaluation techniques.