Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Overview of the first TREC conference
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition using text segments and text themes
Proceedings of the the seventh ACM conference on Hypertext
Using sentence-selection heuristics to rank text segments in TXTRACTOR
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Modern Information Retrieval
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
Topic segmentation: algorithms and applications
Topic segmentation: algorithms and applications
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Discourse segmentation of multi-party conversation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Hi-index | 0.00 |
This paper describes ClassStruggle, an algorithm for linear text segmentation on general corpuses. It relies on an initial clustering of the sentences of the text. This preliminary partitioning provides a global view on the sentences relations existing in the text, considering the similarities in a group rather than individually. ClassStruggle is based on the distribution of the occurrences of the members of each class. During the process, the clusters then evolve, by considering a notion of proximity and of layout in the text, in the aim to create groups that contain only sentences related to a same topic development. Finally, boundaries are created between sentences belonging to two different classes. First experimental results are promising, ClassStruggle appears to be very competitive compared with existing methods.