Automated discourse generation using discourse structure relations
Artificial Intelligence - Special volume on natural language processing
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Artificial Intelligence in Medicine
Automatic generation of textual summaries from neonatal intensive care data
Artificial Intelligence
INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Hi-index | 0.00 |
Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but leave gaps in understanding why this is the case. Alternatively, qualitative evaluations carried out by experts provide knowledge on where a system needs to be improved. In this paper we describe two such evaluations carried out for the BT-Nurse system, using two different methodologies (content analysis and discourse analysis). The outcomes of such evaluations are discussed in comparison to what was learnt from a quantitiave evaluation of BT-Nurse. Implications for the role of similar evaluations in NLG are also discussed.