Summary evaluation: together we stand NPowER-ed

Authors:
George Giannakopoulos;Vangelis Karkaletsis
Affiliations:
Institute of Informatics and Telecommunications, NCSR Demokritos, Aghia Paraskevi, Attiki, Greece;Institute of Informatics and Telecommunications, NCSR Demokritos, Aghia Paraskevi, Attiki, Greece
Venue:
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Year:
2013

Citing 22
Cited 0

The Theory and Practice of Discourse Parsing and Summarization

The Theory and Practice of Discourse Parsing and Summarization
Generating indicative-informative summaries with sumUM

Computational Linguistics - Summarization
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Manual and automatic evaluation of summaries

AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Examining the consensus between human summaries: initial experiments with factoid analysis

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Understanding the process of multi-document summarization: content selection, rewriting and evaluation

Understanding the process of multi-document summarization: content selection, rewriting and evaluation
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
Summarization system evaluation revisited: N-gram graphs

ACM Transactions on Speech and Language Processing (TSLP)
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Mind the gap: dangers of divorcing evaluations of summary content from linguistic quality

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A Metric for Automatically Evaluating Coherent Summaries via Context Chains

ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
DEPEVAL(summ): dependency-based evaluation for automatic summaries

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatically evaluating content selection in summarization without human models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Automatic evaluation of linguistic quality in multi-document summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Multi-document summarization by graph search and matching

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Multilingual summarization evaluation without human models

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Nouveau-rouge: A novelty metric for update summarization

Computational Linguistics
Text summarisation in progress: a literature review

Artificial Intelligence Review
Quantitative evaluation of grammaticality of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
An assessment of the accuracy of automatic evaluation in summarization

Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
The heterogeneity principle in evaluation measures for automatic summarization

Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Summary evaluation has been a distinct domain of research for several years. Human summary evaluation appears to be a high-level cognitive process and, thus, difficult to reproduce. Even though several automatic evaluation methods correlate well to human evaluations over systems, we fail to get equivalent results when judging individual summaries. In this work, we propose the NPowER evaluation method based on machine learning and a set of methods from the family of "n-gram graph"-based summary evaluation methods. First, we show that the combined, optimized use of the evaluation methods outperforms the individual ones. Second, we compare the proposed method to a combination of ROUGE metrics. Third, we study and discuss what can make future evaluation measures better, based on the results of feature selection. We show that we can easily provide per summary evaluations that are far superior to existing performance of evaluation systems and face different measures under a unified view.