A cross-comparison of two clustering methods

  • Authors:
  • Olivier Ferret;Brigitte Grau;Michèle Jardino

  • Affiliations:
  • CEA Saclay, DTI/SITI, Gif-sur-Yvette Cedex;LIMSI CNRS, Orsay, France;LIMSI CNRS, Orsay, France

  • Venue:
  • ELDS '01 Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many Natural Language Processing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments by using an unsupervised and incremental clustering method. In such an approach, an important problem consists of the validation of the learned classes. To do that, we applied another clustering method, that only needs to know the number of classes to build, on the same subset of text segments and we reformulate our evaluation problem in comparing the two classifications. So, we established different criteria to compare them, based either on the words as class descriptors or on the thematic units. Our first results lead to show a great correlation between the two classifications.