Clustering quality measures for data samples with multiple labels

  • Authors:
  • Mohammed Attik;Shadi Al Shehabi;Jean-Charles Lamirel

  • Affiliations:
  • LORIA, Vandœuvre-lès-Nancy, France;LORIA, Vandœuvre-lès-Nancy, France;LORIA, Vandœuvre-lès-Nancy, France

  • Venue:
  • DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focuses on the problem of data classification whenever these data are associated with multiple labels. It especially deals with the case where each label has no antagonistic label and the absence of a label for a data does not necessarily imply that this data cannot have said label, e.g. the substances in mineral exploration, the keywords of the Web pages, . . . We propose new clustering quality measurements which are adapted to data associated with multiple labels. Said measurements are based on the use of two main informations: the similarity between the data given by the clustering algorithm and the distribution of the labels in the model after a projection of these labels on the classification model. Their main area of application is the clustering model selection problem. They can also be used for determining the stopping criterion for the clustering algorithm training. An experimentation of the proposed measurements in the documentary data analysis field shows that they significantly outperform the state of the art.