Evaluating Natural Language Processing Systems: An Analysis and Review
Evaluating Natural Language Processing Systems: An Analysis and Review
Evalinitiatives '03 Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?
GENEVAL: a proposal for shared-task evaluation in NLG
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
A hearer-oriented evaluation of referring expression generation
ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
GENEVAL: a proposal for shared-task evaluation in NLG
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Introducing shared tasks to NLG: the TUNA shared task evaluation challenges
Empirical methods in natural language generation
Evaluating sentence compression: pitfalls and suggested remedies
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Hi-index | 0.00 |
While natural language generation (NLG) has a strong evaluation tradition, in particular in userbased and task-oriented evaluation, it has never evaluated different approaches and techniques by comparing their performance on the same tasks (shared-task evaluation, STE). NLG is characterised by a lack of consolidation of results, and by isolation from the rest of NLP where STE is now standard. It is, moreover, a shrinking field (state-of-the-art MT and summarisation no longer perform generation as a subtask) which lacks the kind of funding and participation that natural language understanding (NLU) has attracted.