Developing and empirically evaluating robust explanation generators: the KNIGHT experiments
Computational Linguistics
Forest-based statistical sentence generation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
PARADISE: a framework for evaluating spoken dialogue agents
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Generation that exploits corpus-based statistical knowledge
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Exploiting a probabilistic hierarchical model for generation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Lessons from a failure: generating tailored smoking cessation letters
Artificial Intelligence
Hybrid Natural Language Generation from Lexical Conceptual Structures
Machine Translation
Exploiting a probabilistic hierarchical model for generation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Natural language generation in dialog systems
HLT '01 Proceedings of the first international conference on Human language technology research
Towards automatic generation of natural language generation systems
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A machine learning approach to the automatic evaluation of machine translation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Corpus-based lexical choice in natural language generation
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Corpus-based methods in natural language generation: friend or foe?
EWNLG '01 Proceedings of the 8th European workshop on Natural Language Generation - Volume 8
Evaluating text quality: judging output texts without a clear source
EWNLG '01 Proceedings of the 8th European workshop on Natural Language Generation - Volume 8
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic Evaluation of Information Ordering: Kendall's Tau
Computational Linguistics
Evaluating American Sign Language generation through the participation of native ASL signers
Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
Evaluation of American Sign Language Generation by Native ASL Signers
ACM Transactions on Accessible Computing (TACCESS)
Intrinsic vs. extrinsic evaluation measures for referring expression generation
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Using semantics to identify web objects
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Human evaluation of a German surface realisation ranker
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Global inference for sentence compression an integer linear programming approach
Journal of Artificial Intelligence Research
Reuse and challenges in evaluating language generation systems: position paper
Evalinitiatives '03 Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?
Automatic evaluation of text coherence: models and representations
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
GENEVAL: a proposal for shared-task evaluation in NLG
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
The GREC challenge: overview and evaluation results
INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Instance-based natural language generation
Natural Language Engineering
Spanning tree approaches for statistical sentence generation
Empirical methods in natural language generation
Human evaluation of a german surface realisation ranker
Empirical methods in natural language generation
Empirical methods in natural language generation
The first challenge on generating instructions in virtual environments
Empirical methods in natural language generation
Evaluating sentence compression: pitfalls and suggested remedies
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Computational generation of referring expressions: A survey
Computational Linguistics
Evaluating evaluation methods for generation in the presence of variation
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Assessing user simulation for dialog systems using human judges and automatic evaluation measures
Natural Language Engineering
Syntax-based grammaticality improvement using CCG and guided search
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
Certain generation applications may profit from the use of stochastic methods. In developing stochastic methods, it is crucial to be able to quickly assess the relative merits of different approaches or models. In this paper, we present several types of intrinsic (system internal) metrics which we have used for baseline quantitative assessment. This quantitative assessment should then be augmented to a fuller evaluation that examines qualitative aspects. To this end, we describe an experiment that tests correlation between the quantitative metrics and human qualitative judgment. The experiment confirms that intrinsic metrics cannot replace human evaluation, but some correlate significantly with human judgments of quality and understandability and can be used for evaluation during development.