The Theory and Practice of Discourse Parsing and Summarization
The Theory and Practice of Discourse Parsing and Summarization
Generating indicative-informative summaries with sumUM
Computational Linguistics - Summarization
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Manual and automatic evaluation of summaries
AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Examining the consensus between human summaries: initial experiments with factoid analysis
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Understanding the process of multi-document summarization: content selection, rewriting and evaluation
Automatic summarising: The state of the art
Information Processing and Management: an International Journal
Summarization system evaluation revisited: N-gram graphs
ACM Transactions on Speech and Language Processing (TSLP)
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Mind the gap: dangers of divorcing evaluations of summary content from linguistic quality
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A Metric for Automatically Evaluating Coherent Summaries via Context Chains
ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
DEPEVAL(summ): dependency-based evaluation for automatic summaries
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatically evaluating content selection in summarization without human models
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Automatic evaluation of linguistic quality in multi-document summarization
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Multi-document summarization by graph search and matching
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Multilingual summarization evaluation without human models
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Nouveau-rouge: A novelty metric for update summarization
Computational Linguistics
Text summarisation in progress: a literature review
Artificial Intelligence Review
Quantitative evaluation of grammaticality of summaries
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
An assessment of the accuracy of automatic evaluation in summarization
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
The heterogeneity principle in evaluation measures for automatic summarization
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
Hi-index | 0.00 |
Summary evaluation has been a distinct domain of research for several years. Human summary evaluation appears to be a high-level cognitive process and, thus, difficult to reproduce. Even though several automatic evaluation methods correlate well to human evaluations over systems, we fail to get equivalent results when judging individual summaries. In this work, we propose the NPowER evaluation method based on machine learning and a set of methods from the family of "n-gram graph"-based summary evaluation methods. First, we show that the combined, optimized use of the evaluation methods outperforms the individual ones. Second, we compare the proposed method to a combination of ROUGE metrics. Third, we study and discuss what can make future evaluation measures better, based on the results of feature selection. We show that we can easily provide per summary evaluations that are far superior to existing performance of evaluation systems and face different measures under a unified view.