A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Optimal multi-paragraph text segmentation by dynamic programming
ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
SeLeCT: a lexical cohesion based news story segmentation system
AI Communications - STAIRS 2002
Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Using linguistically motivated features for paragraph boundary identification
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Efficient linear text segmentation based on information retrieval techniques
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A New Incremental Algorithm for Overlapped Clustering
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
TextLec: a novel method of segmentation by topic using lower windows and lexical cohesion
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Hi-index | 0.00 |
An automatic linear text segmentation in order to detect the best topic boundaries is a difficult and very useful task in many text processing systems. Some methods have tried to solve this problem with reasonable results, but they present some drawbacks as well. In this work, we propose a new method, called ClustSeg, based on a predefined window and a clustering algorithm to decide the topic cohesion. We compare our proposal against the best known methods, with a better performance against these algorithms.