Lexical analysis and stoplists
Information retrieval
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Domain-independent text segmentation using anisotropic diffusion and dynamic programming
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An automatic method of finding topic boundaries
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Hi-index | 0.00 |
This paper presents a domain-independent approach for partitioning text documents into a set of topic-coherent segment units, where the structure of segments reflects the patterns of sub-topics of the processed text document. The approach adopts similarity analyses, which is based on Shannon Information Theory, to determine topic distribution among text documents without incorporating thesaurus information and other auxiliary knowledge bases. It first observes the documents in terms of consistency of distribution from the viewpoint of individual word and then constructs a number of segmentation proposals accordingly. Furthermore, it employs the K-means clustering technique to get a consensus from these proposals and finally partition text into a set of topic coherent paragraphs. Through extensive experimental studies based on real and synthetic data sources, the performance analysis illustrates the effectiveness of the approach in text segmentation.