Understanding the process of multi-document summarization: content selection, rewriting and evaluation

Authors:
Kathleen Mckeown;Ani Nenkova
Affiliations:
Columbia University;Columbia University
Venue:
Understanding the process of multi-document summarization: content selection, rewriting and evaluation
Year:
2006

Citing 0
Cited 8

Summarization system evaluation revisited: N-gram graphs

ACM Transactions on Speech and Language Processing (TSLP)
A Technique for Summarizing Web Reviews

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Lessons learned from large scale evaluation of systems that produce text: nightmares and pleasant surprises

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Generating referring expressions in context: the GREC task evaluation challenges

Empirical methods in natural language generation
Towards a framework for abstractive summarization of multimodal documents

HLT-SS '11 Proceedings of the ACL 2011 Student Session
A behavioural mode research on user-focus summarization

Mathematical and Computer Modelling: An International Journal
Detecting human features in summaries --- symbol sequence statistical regularity

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Summary evaluation: together we stand NPowER-ed

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have seen unprecedented interest in news aggregation and browsing, with dedicated corporate and research websites becoming increasingly popular. Generic multidocument summarization can enhance users' experiences with such sites, and thus the development and evaluation of automatic summarization systems has become not only research, but a very practical challenge. In this thesis, we describe a general modular automatic summarizer that achieves state of the art performance, present our experiments with rewrite of generic noun phrases and of references to people, and demonstrate how distinctions such as familiarity and salience of entities mentioned in the input can be automatically determined. We also propose an intrinsic evaluation method for summarization that incorporates the use of multiple models and allows a better study of human agreement in content selection. Our investigations and experiments have helped us to understand better the process of summarization and to formulate tasks that we believe will lead to future improvements in automatic summarization. It is well-known that humans do not fully agree on what content should be included in a summary. Traditionally, this phenomenon has been studied on the level of sentences, but sentences are a rather coarse level of granularity for content analysis. Here, we introduce an annotation method for semantically driven comparison of several texts for similarities and differences on the subsentential level. When applied to human summaries for the same input, the method allows for a better examination of human agreement, and also provides the basis for an evaluation method that incorporates the notion of importance of a content unit in a summary. Given the variability of human choices, we next address the questions of what features in the input are predictive for inclusion of content in the summary. We use a large collection of human written summaries and the respective inputs to study the predictive effect of one feature that has been widely used in summarization: frequency of occurrence. We show that content units that are repeated frequently in the input tend to be included in at least some human summaries and that human summarizers tend to agree more on the inclusion of frequent content units. In addition, human summaries tend to have higher likelihood under a multinomial model estimated from the input than automatic summaries do. This empirical investigation leads us to propose an algorithm for a context sensitive frequency-based summarizer. We show that context sensitivity and a good choice of composition function for estimating the weight of a sentence lead to a summarizer that performs as well as the best supervised automatic summarizer. We then turn to exploring methods for summary rewrite; that is, techniques for automatic modification of the original author's wording of sentences that are included in a summary. The added flexibility of subsentential changes has potential benefits for improving content selection as well as summary readability. We show that human readers prefer summaries in which references to people have been rewritten to restore the fluency of the text. We further develop our work on references to people, by presenting an approach to automatic classification of entity salience and familiarity, based on robustly derivable lexical, syntactic and frequency features. Such information is necessary for the generation of appropriate referring expressions.