Topic-focused multi-document summarization using an approximate oracle score

  • Authors:
  • John M. Conroy;Judith D. Schlesinger;Dianne P. O'Leary

  • Affiliations:
  • IDA Center for Computing Sciences, Bowie, Maryland;IDA Center for Computing Sciences, Bowie, Maryland;University of Maryland, College Park, Maryland

  • Venue:
  • COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive, in this paper, we explore just how well an extractive method can perform. We introduce an "oracle" score, based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score, we can generate extracts which score, on average, better than the human summaries, when evaluated with ROUGE. In addition, we introduce an approximation to the oracle score which produces a system with the best known performance for the 2005 Document Understanding Conference (DUC) evaluation.