Subtopic structuring for full-length document access
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Topic segmentation with an aspect hidden Markov model
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Topic segmentation: algorithms and applications
Topic segmentation: algorithms and applications
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
Discourse segmentation by human and automated means
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An automatic method of finding topic boundaries
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Cohesion and collocation: using context vectors in text segmentation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Hi-index | 0.00 |
This paper describes a study on performance of existing unsupervised algorithms of text documents topical segmentation when applied to Polish plain text documents. For performance measurement five existing topical segmentation algorithms were selected, three different Polish test collections were created and seven approaches to text pre-processing were implemented. Based on quantitative results (Pk and WindowDiff metrics) use of specific algorithm was recommended and impact of pre-processing strategies was assessed. Thanks to use of standardized metrics and application of previously described methodology for test collection development, comparative results for Polish and English were also obtained.