A new efficient and unbiased approach for clustering quality evaluation

  • Authors:
  • Jean-Charles Lamirel;Pascal Cuxac;Raghvendra Mall;Ghada Safi

  • Affiliations:
  • LORIA, Vand$#339/uvre-lè/s-Nancy, France;INIST-CNRS, Vand$#339/uvre-lè/s-Nancy, France;Center of Data Engineering, IIIT Hyderabad, Hyderabad, Andhra Pradesh, India;Department of Mathematics, Faculty of Science, Aleppo University, Aleppo, Syria

  • Venue:
  • PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional quality indexes (Inertia, DB, …) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.