Evaluation challenges in large-scale document summarization

  • Authors:
  • Dragomir R. Radev;Simone Teufel;Horacio Saggion;Wai Lam;John Blitzer;Hong Qi;Arda Çelebi;Danyu Liu;Elliott Drabek

  • Affiliations:
  • U. of Michigan;U. of Cambridge;U. of Sheffield;Chinese U. of Hong Kong;U. of Pennsylvania;U. of Michigan;USC/ISI;U. of Alabama;Johns Hopkins U.

  • Venue:
  • ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200 Million automatic document and summary retrievals using 20 queries. We present both qualitative and quantitative results showing the strengths and draw-backs of all evaluation methods and how they rank the different summarizers.