Text segmentation using context overlap

  • Authors:
  • Radim Řehůřek

  • Affiliations:
  • Faculty of Informatics, Masaryk University

  • Venue:
  • EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Experimental results are presented, along with comparison to other existing algorithms.