Text segmentation by clustering cohesion

Authors:
Raúl Abella Pérez;José Eladio Medina Pagola
Affiliations:
Advanced Technologies Application Centre, Ciudad de la Habana, Cuba;Advanced Technologies Application Centre, Ciudad de la Habana, Cuba
Venue:
CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Year:
2010

Citing 10
Cited 0

A critique and improvement of an evaluation metric for text segmentation

Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages

Computational Linguistics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Optimal multi-paragraph text segmentation by dynamic programming

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
SeLeCT: a lexical cohesion based news story segmentation system

AI Communications - STAIRS 2002
Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Using linguistically motivated features for paragraph boundary identification

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Efficient linear text segmentation based on information retrieval techniques

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A New Incremental Algorithm for Overlapped Clustering

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
TextLec: a novel method of segmentation by topic using lower windows and lexical cohesion

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

An automatic linear text segmentation in order to detect the best topic boundaries is a difficult and very useful task in many text processing systems. Some methods have tried to solve this problem with reasonable results, but they present some drawbacks as well. In this work, we propose a new method, called ClustSeg, based on a predefined window and a clustering algorithm to decide the topic cohesion. We compare our proposal against the best known methods, with a better performance against these algorithms.