A hybrid hierarchical model for multi-document summarization

  • Authors:
  • Asli Celikyilmaz;Dilek Hakkani-Tur

  • Affiliations:
  • University of California, Berkeley;International Computer Science Institute, Berkeley, CA

  • Venue:
  • ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ~7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.