Clustering analysis for data samples with multiple labels

Authors:
Mohammed Attik;Shadi Al Shehabi;Jean-Charles Lamirel
Affiliations:
LORIA, Vandœuvre-lès-Nancy, France;LORIA, Vandœuvre-lès-Nancy, France;LORIA, Vandœuvre-lès-Nancy, France
Venue:
DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Year:
2006

Citing 3
Cited 2

Information Retrieval

Information Retrieval
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Intelligent patent analysis through the use of a neural network: experiment of multi-viewpoint analysis with the MultiSOM model

PATENT '03 Proceedings of the ACL-2003 workshop on Patent corpus processing - Volume 20

A new efficient and unbiased approach for clustering quality evaluation

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
A new approach for automatizing the analysis of research topics dynamics: application to optoelectronics research

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new clustering analysis approach based on data samples with multiple labels. It especially deals with the case where each label has no antagonistic label and the absence of a label for a data does not necessarily imply that this data cannot have said label, e.g. the substances in mineral exploration, the keywords of the Web pages, . . . The proposed approach relies on two analyses that are conduced in a parallel way: cluster analysis and label analysis. The cluster analysis aims at selecting the most interesting or relevant clusters. The label analysis aims both at classifying the labels into specific categories such as implicit, explicit, noisy and novel and into more general embedding categories that are relevant and irrelevant. The proposed analysis methods are based on the use of two main informations: the similarity between the data given by the clustering algorithm and the distribution of the labels in the model after a projection of these labels on the classification model. Moreover, these methods make use of original quality measures for performing both labels and cluster analyses. An experimentation in the domain of documentary data highlights the accuracy of the proposed approach.