Capturing sentence prior for query-based multi-document summarization

  • Authors:
  • J. Jagadeesh;Prasad Pingali;Vasudeva Varma

  • Affiliations:
  • Microsoft Research Lab, India;International Institute of Information Technology, India;International Institute of Information Technology, India

  • Venue:
  • Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we have considered a real world information synthesis task, generation of a fixed length multi document summary which satisfies a specific information need. This task was mapped to a topic-oriented, informative multi-document summarization. We also tried to estimate, given the human written reference summaries and the document set, the maximum performance (ROUGE scores) that can be achieved by an extraction-based summarization technique. Motivated by the observation that the current approaches are far behind the estimated maximum performance, we have looked at Information Retrieval techniques to improve the relevance scoring of sentences towards information need. Following information theoretic approach we have identified a measure to capture the notion of importance or prior of a sentence. Following a different decomposition of Probability Ranking Principle, the calculated importance/prior is incorporated into the final sentence scoring by weighted linear combination. In order to evaluate the performance, we have explored information sources like WWW and encyclopedia in computing the information measure in a set of different experiments. The t-test analysis of the improvement on DUC 2005 data set is found to be significant (p ~ 0.05). The same system has outperformed rest of the systems at DUC 2006 challenge in terms of ROUGE scores with a significant margin over the next best system.