Extractive multi-document summaries should explicitly not contain document-specific content

  • Authors:
  • Rebecca Mason;Eugene Charniak

  • Affiliations:
  • Brown University, Providence, RI;Brown University, Providence, RI

  • Venue:
  • WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unsupervised approaches to multi-document summarization consist of two steps: finding a content model of the documents to be summarized, and then generating a summary that best represents the most salient information of the documents. In this paper, we present a sentence selection objective for extractive summarization in which sentences are penalized for containing content that is specific to the documents they were extracted from. We modify an existing system, Hier-Sum (Haghighi & Vanderwende, 2009), to use our objective, which significantly outperforms the original HierSum in pairwise user evaluation. Additionally, our ROUGE scores advance the current state-of-the-art for both supervised and unsupervised systems with statistical significance.