Textual properties and task based evaluation: investigating the role of surface properties, structure and content

Authors:
Albert Gatt;François Portet
Affiliations:
University of Malta;Grenoble Institute of Technology
Venue:
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Year:
2010

Citing 16
Cited 0

Maintaining knowledge about temporal intervals

Communications of the ACM
Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
Narrative prose generation

Artificial Intelligence
Lessons from a failure: generating tailored smoking cessation letters

Artificial Intelligence
Developing and empirically evaluating robust explanation generators: the KNIGHT experiments

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Adaptive, intelligent presentation of information for the museum visitor in PEACH

User Modeling and User-Adapted Interaction
Automatic generation of textual summaries from neonatal intensive care data

Artificial Intelligence
Correlation between ROUGE and human evaluation of extractive meeting summaries

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Evaluating coverage for large symbolic NLG grammars

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Choosing words in computer-generated weather forecasts

Artificial Intelligence - Special volume on connecting language to the world
An investigation into the validity of some metrics for automatically evaluating natural language generation systems

Computational Linguistics
Automated metrics that agree with human judgements on generated output for an embodied conversational agent

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
The importance of narrative and other lessons from an evaluation of an NLG system that summarises clinical data

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Introducing shared tasks to NLG: the TUNA shared task evaluation challenges

Empirical methods in natural language generation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the relationship between the results of an extrinsic, task-based evaluation of an NLG system and various metrics measuring both surface and deep semantic textual properties, including relevance. The latter rely heavily on domain knowledge. We show that they correlate systematically with some measures of performance. The core argument of this paper is that more domain knowledge-based metrics shed more light on the relationship between deep semantic properties of a text and task performance.