A cross-comparison of two clustering methods

Authors:
Olivier Ferret;Brigitte Grau;Michèle Jardino
Affiliations:
CEA Saclay, DTI/SITI, Gif-sur-Yvette Cedex;LIMSI CNRS, Orsay, France;LIMSI CNRS, Orsay, France
Venue:
ELDS '01 Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9
Year:
2001

Citing 4
Cited 0

Elements of information theory

Elements of information theory
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Robust automated topic identification

Robust automated topic identification
Text segmentation based on similarity between words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many Natural Language Processing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments by using an unsupervised and incremental clustering method. In such an approach, an important problem consists of the validation of the learned classes. To do that, we applied another clustering method, that only needs to know the number of classes to build, on the same subset of text segments and we reformulate our evaluation problem in comparing the two classifications. So, we established different criteria to compare them, based either on the words as class descriptors or on the thematic units. Our first results lead to show a great correlation between the two classifications.