An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
DEPEVAL(summ): dependency-based evaluation for automatic summaries
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Corroborating text evaluation results with heterogeneous measures
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Summary evaluation: together we stand NPowER-ed
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
The development of summarization systems requires reliable similarity (evaluation) measures that compare system outputs with human references. A reliable measure should have correspondence with human judgements. However, the reliability of measures depends on the test collection in which the measure is meta-evaluated; for this reason, it has not yet been possible to reliably establish which are the best evaluation measures for automatic summarization. In this paper, we propose an unsupervised method called Heterogeneity-Based Ranking (HBR) that combines summarization evaluation measures without requiring human assessments. Our empirical results indicate that HBR achieves a similar correspondence with human assessments than the best single measure for every observed corpus. In addition, HBR results are more robust across topics than single measures.