A comparison of rankings produced by summarization evaluation measures

Authors:
Robert L. Donaway;Kevin W. Drummey;Laura A. Mather
Affiliations:
Ft. Meade, MD;Ft. Meade, MD;Britannica.com, Inc., La Jolla, CA
Venue:
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Year:
2000

Citing 8
Cited 16

Automatic text processing

Automatic text processing
Context based text handling

Information Processing and Management: an International Journal
The identification of important concepts in highly structured technical papers

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic condensation of electronic publications by sentence selection

Information Processing and Management: an International Journal - Special issue: summarizing text
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The TIPSTER SUMMAC Text Summarization Evaluation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics

Temporal summaries of new topics

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to the special issue on summarization

Computational Linguistics - Summarization
Summarization evaluation using relative utility

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
SUMMAC: a text summarization evaluation

Natural Language Engineering
CAST: a computer-aided summarisation tool

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Meta-evaluation of summaries in a cross-lingual environment using content-based metrics

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An evolutionary approach for improving the quality of automatic summaries

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Examining the consensus between human summaries: initial experiments with factoid analysis

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
The potential and limitations of automatic sentence extraction for summarization

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Temporal multi-page summarization

Web Intelligence and Agent Systems
Ontology summarization based on rdf sentence graph

Proceedings of the 16th international conference on World Wide Web
An automatic method for summary evaluation using multiple evaluation results by a manual method

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
User-model based personalized summarization

Information Processing and Management: an International Journal
Colouring summaries BLEU

Evalinitiatives '03 Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?

Quantified Score

Hi-index	0.01

Visualization

Abstract

Summary evaluation measures produce a ranking of all possible extract summaries of a document. Recall-based evaluation measures, which depend on costly human-generated ground truth summaries, produce uncorrelated rankings when ground truth is varied. This paper proposes using sentence-rank-based and content-based measures for evaluating extract summaries, and compares these with recall-based evaluation measures. Content-based measures increase the correlation of rankings induced by synonymous ground truths, and exhibit other desirable properties.