A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Wide-coverage deep statistical parsing using automatic dependency structure annotation
Computational Linguistics
Evaluating machine translation with LFG dependencies
Machine Translation
A dependency-based method for evaluating broad-coverage parsers
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Correlating human and automatic evaluation of a German surface realiser
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Detecting errors in automatically-parsed dependency relations
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Human evaluation of a german surface realisation ranker
Empirical methods in natural language generation
Text summarisation in progress: a literature review
Artificial Intelligence Review
Corroborating text evaluation results with heterogeneous measures
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The heterogeneity principle in evaluation measures for automatic summarization
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
Summary evaluation: together we stand NPowER-ed
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
This paper presents DEPEVAL(summ), a dependency-based metric for automatic evaluation of summaries. Using a reranking parser and a Lexical-Functional Grammar (LFG) annotation, we produce a set of dependency triples for each summary. The dependency set for each candidate summary is then automatically compared against dependencies generated from model summaries. We examine a number of variations of the method, including the addition of WordNet, partial matching, or removing relation labels from the dependencies. In a test on TAC 2008 and DUC 2007 data, DEPEVAL(summ) achieves comparable or higher correlations with human judgments than the popular evaluation metrics ROUGE and Basic Elements (BE).