Measuring importance and query relevance in topic-focused multi-document summarization

Authors:
Surabhi Gupta;Ani Nenkova;Dan Jurafsky
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Year:
2007

Citing 5
Cited 7

Foundations of statistical natural language processing

Foundations of statistical natural language processing
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-focused multi-document summarization using an approximate oracle score

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic text summarization of newswire: lessons learned from the document understanding conference

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3

FastSum: fast and accurate query-based multi-document summarization

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Towards automatic generation of gene summary

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
An exploration of document impact on graph-based multi-document summarization

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An extractive supervised two-stage method for sentence compression

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning web query patterns for imitating Wikipedia articles

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
GEMS: generative modeling for evaluation of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Exploiting relevance, coverage, and novelty for query-focused multi-document summarization

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the advantages of log-likelihood ratio come from its known distributional properties which allow for the identification of a set of words that in its entirety defines the aboutness of the input. We also find that LLR is more suitable for query-focused summarization since, unlike raw frequency, it is more sensitive to the integration of the information need defined by the user.