Evaluation metrics for generation

Authors:
Srinivas Bangalore;Owen Rambow;Steve Whittaker
Affiliations:
AT&T Labs - Research, NJ;AT&T Labs - Research, NJ;AT&T Labs - Research, NJ
Venue:
INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Year:
2000

Citing 5
Cited 35

Developing and empirically evaluating robust explanation generators: the KNIGHT experiments

Computational Linguistics
Forest-based statistical sentence generation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Exploiting a probabilistic hierarchical model for generation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1

Lessons from a failure: generating tailored smoking cessation letters

Artificial Intelligence
Hybrid Natural Language Generation from Lexical Conceptual Structures

Machine Translation
Exploiting a probabilistic hierarchical model for generation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Natural language generation in dialog systems

HLT '01 Proceedings of the first international conference on Human language technology research
Towards automatic generation of natural language generation systems

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A machine learning approach to the automatic evaluation of machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Corpus-based lexical choice in natural language generation

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Corpus-based methods in natural language generation: friend or foe?

EWNLG '01 Proceedings of the 8th European workshop on Natural Language Generation - Volume 8
Evaluating text quality: judging output texts without a clear source

EWNLG '01 Proceedings of the 8th European workshop on Natural Language Generation - Volume 8
Models for sentence compression: a comparison across domains, training requirements and evaluation measures

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic Evaluation of Information Ordering: Kendall's Tau

Computational Linguistics
Evaluating American Sign Language generation through the participation of native ASL signers

Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
Evaluation of American Sign Language Generation by Native ASL Signers

ACM Transactions on Accessible Computing (TACCESS)
Intrinsic vs. extrinsic evaluation measures for referring expression generation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Using semantics to identify web objects

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Human evaluation of a German surface realisation ranker

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Improving grammaticality in statistical sentence generation: introducing a dependency spanning tree algorithm with an argument satisfaction model

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Global inference for sentence compression an integer linear programming approach

Journal of Artificial Intelligence Research
Reuse and challenges in evaluating language generation systems: position paper

Evalinitiatives '03 Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?
Automatic evaluation of text coherence: models and representations

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
An investigation into the validity of some metrics for automatically evaluating natural language generation systems

Computational Linguistics
GENEVAL: a proposal for shared-task evaluation in NLG

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
The GREC challenge: overview and evaluation results

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Instance-based natural language generation

Natural Language Engineering
Spanning tree approaches for statistical sentence generation

Empirical methods in natural language generation
Human evaluation of a german surface realisation ranker

Empirical methods in natural language generation
Structural features for predicting the linguistic quality of text: applications to machine translation, automatic summarization and human-authored text

Empirical methods in natural language generation
The first challenge on generating instructions in virtual environments

Empirical methods in natural language generation
Evaluating sentence compression: pitfalls and suggested remedies

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Computational generation of referring expressions: A survey

Computational Linguistics
Evaluating evaluation methods for generation in the presence of variation

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Assessing user simulation for dialog systems using human judges and automatic evaluation measures

Natural Language Engineering
Syntax-based grammaticality improvement using CCG and guided search

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Certain generation applications may profit from the use of stochastic methods. In developing stochastic methods, it is crucial to be able to quickly assess the relative merits of different approaches or models. In this paper, we present several types of intrinsic (system internal) metrics which we have used for baseline quantitative assessment. This quantitative assessment should then be augmented to a fuller evaluation that examines qualitative aspects. To this end, we describe an experiment that tests correlation between the quantitative metrics and human qualitative judgment. The experiment confirms that intrinsic metrics cannot replace human evaluation, but some correlate significantly with human judgments of quality and understandability and can be used for evaluation during development.