Text Segmentation into Paragraphs Based on Local Text Cohesion

  • Authors:
  • Igor A. Bolshakov;Alexander F. Gelbukh

  • Affiliations:
  • -;-

  • Venue:
  • TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically self-contained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Specifically, we propose a method of quantitative evaluation of text cohesion based on a large linguistic resource - a collocation network. At each step, our algorithm compares word occurrences in a text against a large DB of collocations and semantic links between words in the given natural language. The procedure consists in evaluation of the cohesion function, its smoothing, normalization, and comparing with a specially constructed threshold.