Assessing the trade-off between system building cost and output quality in data-to-text generation

Authors:
Anja Belz;Eric Kow
Affiliations:
School of Computing, Mathematical and Information Sciences, University of Brighton, Brighton, UK;School of Computing, Mathematical and Information Sciences, University of Brighton, Brighton, UK
Venue:
Empirical methods in natural language generation
Year:
2010

Citing 18
Cited 2

A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Forest-based statistical sentence generation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Building applied natural language generation systems

Natural Language Engineering
Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Learning for semantic parsing with statistical machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
That's nice... what can you do with it?

Computational Linguistics
Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models

Natural Language Engineering
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
System building cost vs. output quality in data-to-text generation

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Exploiting 'subjective' annotations

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Choosing words in computer-generated weather forecasts

Artificial Intelligence - Special volume on connecting language to the world
An investigation into the validity of some metrics for automatically evaluating natural language generation systems

Computational Linguistics
Introducing shared tasks to NLG: the TUNA shared task evaluation challenges

Empirical methods in natural language generation
Generating referring expressions in context: the GREC task evaluation challenges

Empirical methods in natural language generation

Human evaluation of a german surface realisation ranker

Empirical methods in natural language generation
The first challenge on generating instructions in virtual environments

Empirical methods in natural language generation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality? In this paper, we investigate the cost/quality trade-off in generation system building. We compare six data-to-text systems which were created by predominantly automatic techniques against six systems for the same domain which were created by predominantly manual techniques. We evaluate the systems using intrinsic automatic metrics and human quality ratings. We find that there is some correlation between degree of automation in the system-building process and output quality (more automation tending to mean lower evaluation scores). We also find that there are discrepancies between the results of the automatic evaluation metrics and the human-assessed evaluation experiments. We discuss caveats in assessing system-building cost and implications of the discrepancies in automatic and human evaluation.