Unsupervised methods of topical text segmentation for Polish

  • Authors:
  • Dominik Flejter;Karol Wieloch;Witold Abramowicz

  • Affiliations:
  • University of Economics, Poznań, Poland;University of Economics, Poznań, Poland;University of Economics, Poznań, Poland

  • Venue:
  • ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a study on performance of existing unsupervised algorithms of text documents topical segmentation when applied to Polish plain text documents. For performance measurement five existing topical segmentation algorithms were selected, three different Polish test collections were created and seven approaches to text pre-processing were implemented. Based on quantitative results (Pk and WindowDiff metrics) use of specific algorithm was recommended and impact of pre-processing strategies was assessed. Thanks to use of standardized metrics and application of previously described methodology for test collection development, comparative results for Polish and English were also obtained.