A new efficient and unbiased approach for clustering quality evaluation

Authors:
Jean-Charles Lamirel;Pascal Cuxac;Raghvendra Mall;Ghada Safi
Affiliations:
LORIA, Vand$#339/uvre-lè/s-Nancy, France;INIST-CNRS, Vand$#339/uvre-lè/s-Nancy, France;Center of Data Engineering, IIIT Hyderabad, Hyderabad, Andhra Pradesh, India;Department of Mathematics, Faculty of Science, Aleppo University, Aleppo, Syria
Venue:
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Year:
2011

Citing 9
Cited 1

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Information Retrieval

Information Retrieval
On Clustering Validation Techniques

Journal of Intelligent Information Systems
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Clustering analysis for data samples with multiple labels

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Semantic classification with distributional kernels

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Novel labeling strategies for hierarchical representation of multidimensional data analysis results

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Feature-based cluster validation for high-dimensional data

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Classifying French verbs using French and English lexical resources

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional quality indexes (Inertia, DB, …) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.