Using N-Grams to understand the nature of summaries

  • Authors:
  • Michele Banko;Lucy Vanderwende

  • Affiliations:
  • One Microsoft Way, Redmond, WA;One Microsoft Way, Redmond, WA

  • Venue:
  • HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although single-document summarization is a well-studied task, the nature of multi-document summarization is only beginning to be studied in detail. While close attention has been paid to what technologies are necessary when moving from single to multi-document summarization, the properties of human-written multi-document summaries have not been quantified. In this paper, we empirically characterize human-written summaries provided in a widely used summarization corpus by attempting to answer the questions: Can multi-document summaries that are written by humans be characterized as extractive or generative? Are multi-document summaries less extractive than single-document summaries? Our results suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.