The effects of analysing cohesion on document summarisation

Authors:
Branimir K. Boguraev;Mary S. Neff
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 8
Cited 1

A full-text retrieval system with a dynamic abstract generation function

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic condensation of electronic publications by sentence selection

Information Processing and Management: an International Journal - Special issue: summarizing text
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
The TIPSTER SUMMAC Text Summarization Evaluation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Common Topics and Coherent Situations: interpreting ellipsis in the context of discourse inference

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Anaphora for everyone: pronominal anaphora resoluation without a parser

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1

Using maximum entropy for sentence extraction

AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4

Quantified Score

Hi-index	0.00

Visualization

Abstract

We argue that in general, the analysis of lexical cohesion factors in a document can drive a summarizer, as well as enable other content characterization tasks. More narrowly, this paper focuses on how one particular cohesion factor--simple lexical repetition---can enhance an existing sentence extraction summarizer, by enabling strategies for overcoming some particularly jarring end-user effects in the summaries, typically due to coherence degradation, readability deterioration, and topical under-representation. Lexical repetition is instrumental to, among other things, the topical make-up of a text, and in our framework a lexical repetition-based model of discourse segmentation, capable of detecting topic shifts, is integrated with a linguistically-aware summarizer utilizing notions of salience and dynamically-adjustable summary size. We show that even by leveraging lexical repetition alone, summaries are of comparable, and under certain conditions better, quality than the ones delivered by a state-of-the-art summarizer. This is encouraging for a broad research platform focusing on the recognition and use of cohesive devices in text for a range of content characterisation and document management tasks.