What is in a text and what does it do: qualitative evaluations of an NLG system -- the BT-Nurse -- using content analysis and discourse analysis

Authors:
Rahul Sambaraju;Ehud Reiter;Robert Logie;Andy McKinlay;Chris McVittie;Albert Gatt;Cindy Sykes
Affiliations:
Queen Margaret Univ, UK;Univ of Aberdeen, UK;Univ of Edinburgh, UK;Univ of Edinburgh, UK;Queen Margaret Univ, UK;Univ of Malta, Malta;Edinburgh Royal Infirmary, U. K
Venue:
ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Year:
2011

Citing 7
Cited 0

Automated discourse generation using discourse structure relations

Artificial Intelligence - Special volume on natural language processing
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Position paper: Temporal representation and reasoning in medicine: Research directions and challenges

Artificial Intelligence in Medicine
Automatic generation of textual summaries from neonatal intensive care data

Artificial Intelligence
From data to text in the Neonatal Intensive Care Unit: Using NLG technology for decision support and information management

AI Communications
An investigation into the validity of some metrics for automatically evaluating natural language generation systems

Computational Linguistics
The importance of narrative and other lessons from an evaluation of an NLG system that summarises clinical data

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but leave gaps in understanding why this is the case. Alternatively, qualitative evaluations carried out by experts provide knowledge on where a system needs to be improved. In this paper we describe two such evaluations carried out for the BT-Nurse system, using two different methodologies (content analysis and discourse analysis). The outcomes of such evaluations are discussed in comparison to what was learnt from a quantitiave evaluation of BT-Nurse. Implications for the role of similar evaluations in NLG are also discussed.