Clustering analysis for data samples with multiple labels

  • Authors:
  • Mohammed Attik;Shadi Al Shehabi;Jean-Charles Lamirel

  • Affiliations:
  • LORIA, Vandœuvre-lès-Nancy, France;LORIA, Vandœuvre-lès-Nancy, France;LORIA, Vandœuvre-lès-Nancy, France

  • Venue:
  • DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new clustering analysis approach based on data samples with multiple labels. It especially deals with the case where each label has no antagonistic label and the absence of a label for a data does not necessarily imply that this data cannot have said label, e.g. the substances in mineral exploration, the keywords of the Web pages, . . . The proposed approach relies on two analyses that are conduced in a parallel way: cluster analysis and label analysis. The cluster analysis aims at selecting the most interesting or relevant clusters. The label analysis aims both at classifying the labels into specific categories such as implicit, explicit, noisy and novel and into more general embedding categories that are relevant and irrelevant. The proposed analysis methods are based on the use of two main informations: the similarity between the data given by the clustering algorithm and the distribution of the labels in the model after a projection of these labels on the classification model. Moreover, these methods make use of original quality measures for performing both labels and cluster analyses. An experimentation in the domain of documentary data highlights the accuracy of the proposed approach.