Fast generation of abstracts from general domain text corpora by extracting relevant sentences

  • Authors:
  • Klaus Zechner

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
  • Year:
  • 1996

Quantified Score

Hi-index 0.02

Visualization

Abstract

This paper describes a system for generating text abstracts which relies on a general, purely statistical principle, i.e., on the notion of "relevance", as it is defined in terms of the combination of feild weights of words in a sentence. The system generates abstracts from newspaper articles by selecting the "most relevant" sentences and combining them in text order. Since neither domain knowledge nor text-sort-specific heuristics are involved, this system provides maximal generality and flexibility. Also, it is fast and can be efficiently implemented for both on-line and off-line purposes. An experiment shows that recall and precision for the extracted sentences (taking the sentences extracted by human subjects as a baseline) is within the same range as recall/precision when the human subjects are compared amongst each other: this means in fact that the performance of the system is indistinguishable from the performance of a human abstractor. Finally, the system yields significantly better results than a default "lead" algorithm does which chooses just some initial sentences from the text.