Capturing sentence prior for query-based multi-document summarization

Authors:
J. Jagadeesh;Prasad Pingali;Vasudeva Varma
Affiliations:
Microsoft Research Lab, India;International Institute of Information Technology, India;International Institute of Information Technology, India
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 8
Cited 0

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model of information retrieval: development and comparative experiments Part 2

Information Processing and Management: an International Journal
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query-biased web page summarisation: a task-oriented evaluation

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
A Mathematical Theory of Communication

A Mathematical Theory of Communication
AnswerBus question answering system

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we have considered a real world information synthesis task, generation of a fixed length multi document summary which satisfies a specific information need. This task was mapped to a topic-oriented, informative multi-document summarization. We also tried to estimate, given the human written reference summaries and the document set, the maximum performance (ROUGE scores) that can be achieved by an extraction-based summarization technique. Motivated by the observation that the current approaches are far behind the estimated maximum performance, we have looked at Information Retrieval techniques to improve the relevance scoring of sentences towards information need. Following information theoretic approach we have identified a measure to capture the notion of importance or prior of a sentence. Following a different decomposition of Probability Ranking Principle, the calculated importance/prior is incorporated into the final sentence scoring by weighted linear combination. In order to evaluate the performance, we have explored information sources like WWW and encyclopedia in computing the information measure in a set of different experiments. The t-test analysis of the improvement on DUC 2005 data set is found to be significant (p ~ 0.05). The same system has outperformed rest of the systems at DUC 2006 challenge in terms of ROUGE scores with a significant margin over the next best system.