Optimal determination of user-oriented clusters: an application for the reproductive plan
Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application
Probabilistic and genetic algorithms in document retrieval
Communications of the ACM
Scale-Space and Edge Detection Using Anisotropic Diffusion
IEEE Transactions on Pattern Analysis and Machine Intelligence
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Overview of the first TREC conference
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition using text segments and text themes
Proceedings of the the seventh ACM conference on Hypertext
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Using sentence-selection heuristics to rank text segments in TXTRACTOR
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Modern Information Retrieval
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
Domain-independent text segmentation using anisotropic diffusion and dynamic programming
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Linear text segmentation using a dynamic programming algorithm
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A statistical model for domain-independent text segmentation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Discourse segmentation of multi-party conversation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
SegGen: a genetic algorithm for linear text segmentation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Revealing the structure of medical dictations with conditional random fields
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
The automatic text segmentation task consists of identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. Text segmentation has motivated a large amount of research. We focus here on the statistical approaches that rely on an analysis of the distribution of the words in the text. Usually, the segmentation of texts is realized sequentially on the basis of very local clues. However, such an approach prevents the consideration of the text in a global way, particularly concerning the granularity degree adopted for the expression of the different topics it addresses. We thus propose here two new segmentation algorithms-ClassStruggle and SegGen-which use criteria rendering global views of texts. ClassStruggle is based on an initial clustering of the sentences of the text, thus allowing the consideration of similarities within a group rather than individually. It relies on the distribution of the occurrences of the members of each class1 to segment the texts. SegGen proposes to evaluate potential segmentations of the whole text thanks to a genetic algorithm. It attempts to find a solution of segmentation optimizing two criteria, the maximization of the internal cohesion of the segments and the minimization of the similarity between adjacent ones. According to experimental results, both approaches appear to be very competitive compared to existing methods.