Efficient linear text segmentation based on information retrieval techniques

  • Authors:
  • Roman Kern;Michael Granitzer

  • Affiliations:
  • Know-Center, Graz;Graz University of Technology, Graz

  • Venue:
  • Proceedings of the International Conference on Management of Emergent Digital EcoSystems
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

The task of linear text segmentation is to split a large text document into shorter fragments, usually blocks of consecutive sentences. The algorithms that demonstrated the best performance for this task come at the price of high computational complexity. In our work we present an algorithm that has a computational complexity of O(n) with n being the number of sentences in a document. The performance of our approach is evaluated against algorithms of higher complexity using standard benchmark data sets and we demonstrate that our approach provides comparable accuracy.