A comparison of rankings produced by summarization evaluation measures

  • Authors:
  • Robert L. Donaway;Kevin W. Drummey;Laura A. Mather

  • Affiliations:
  • Ft. Meade, MD;Ft. Meade, MD;Britannica.com, Inc., La Jolla, CA

  • Venue:
  • NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

Summary evaluation measures produce a ranking of all possible extract summaries of a document. Recall-based evaluation measures, which depend on costly human-generated ground truth summaries, produce uncorrelated rankings when ground truth is varied. This paper proposes using sentence-rank-based and content-based measures for evaluating extract summaries, and compares these with recall-based evaluation measures. Content-based measures increase the correlation of rankings induced by synonymous ground truths, and exhibit other desirable properties.