Text Entailment for Logical Segmentation and Summarization

  • Authors:
  • Doina Tatar;Andreea Diana Mihis;Dana Lupsa

  • Affiliations:
  • University "Babes-Bolyai", Cluj-Napoca, Romania;University "Babes-Bolyai", Cluj-Napoca, Romania;University "Babes-Bolyai", Cluj-Napoca, Romania

  • Venue:
  • NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Summarization is the process of condensing a source text into a shorter version preserving its information content ([2]). This paper presents some original methods for text summarization by extraction of a single source document based on a particular intuition which is not explored till now: the logical structure of a text. The summarization relies on an original linear segmentation algorithm which we denote logical segmentation (LTT) because the score of a sentence is the number of sentences of the text which are entailed by it.The summary is obtained by three methods: selecting the first sentence(s) from a segment, selecting the best scored sentence(s) from a segment and selecting the most informative sentence(s) (relative to the previously selected) from a segment. Moreover, our methods permit dynamically adjusting the derived summary size, independently of the number of segments.Alternatively, a Dynamic Programming (DP) method, based on the continuity principle and applied to the sentences logically scored as above is proposed. This method proceeds by obtaining the summary firstly and then determining the segments.Our methods of segmentation are applied and evaluated against the segmentation of the text "I spent the first 19 years" of Morris and Hirst ([17]). The original text is reproduced at [26]. Some statistics about the informativeness of the summaries with different lengths and obtained with the above methods relatively to the original (summarized) text are given. These statistics prove that the segmentation preceding the summarization could improve the quality of obtained summaries.