Elements of information theory
Elements of information theory
Robust automated topic identification
Robust automated topic identification
Text segmentation based on similarity between words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Hi-index | 0.00 |
Many Natural Language Processing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments by using an unsupervised and incremental clustering method. In such an approach, an important problem consists of the validation of the learned classes. To do that, we applied another clustering method, that only needs to know the number of classes to build, on the same subset of text segments and we reformulate our evaluation problem in comparing the two classifications. So, we established different criteria to compare them, based either on the words as class descriptors or on the thematic units. Our first results lead to show a great correlation between the two classifications.