Shared-task evaluations in HLT: lessons for NLG

Authors:
Anja Belz;Adam Kilgarriff
Affiliations:
University of Brighton, UK;Lexical Computing Ltd., UK
Venue:
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Year:
2006

Citing 3
Cited 4

Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
No-bureaucracy evaluation

Evalinitiatives '03 Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?
GENEVAL: a proposal for shared-task evaluation in NLG

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference

A hearer-oriented evaluation of referring expression generation

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
GENEVAL: a proposal for shared-task evaluation in NLG

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Introducing shared tasks to NLG: the TUNA shared task evaluation challenges

Empirical methods in natural language generation
Evaluating sentence compression: pitfalls and suggested remedies

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation

Quantified Score

Hi-index	0.00

Visualization

Abstract

While natural language generation (NLG) has a strong evaluation tradition, in particular in userbased and task-oriented evaluation, it has never evaluated different approaches and techniques by comparing their performance on the same tasks (shared-task evaluation, STE). NLG is characterised by a lack of consolidation of results, and by isolation from the rest of NLP where STE is now standard. It is, moreover, a shrinking field (state-of-the-art MT and summarisation no longer perform generation as a subtask) which lacks the kind of funding and participation that natural language understanding (NLU) has attracted.