Summarization evaluation using relative utility

Authors:
Dragomir R. Radev;Daniel Tam
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 3
Cited 8

Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
A comparison of rankings produced by summarization evaluation measures

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4

Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Summarizing short stories

Computational Linguistics
Studies on intrinsic summary evaluation

International Journal of Artificial Intelligence and Soft Computing
Text summarisation in progress: a literature review

Artificial Intelligence Review
Revisiting centrality-as-relevance: support sets and similarity as geometric proximity

Journal of Artificial Intelligence Research
Combining summaries using unsupervised rank aggregation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Automatically assessing machine summary content without a gold standard

Computational Linguistics

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a series of experiments to demonstrate the validity of Relative Utility (RU) as a measure for evaluating extractive summarizers. RU is applicable in both single-document and multi-document summarization, is extendable to arbitrary compression rates with no extra annotation effort, and takes into account both random system performance and interjudge agreement. Our results using the JHU summary corpus indicate that RU is a reasonable and often superior alternative to several common evaluation metrics.