A density-based cluster validity approach using multi-representatives

  • Authors:
  • Maria Halkidi;Michalis Vazirgiannis

  • Affiliations:
  • Department of Informatics, Athens University of Economics and Business, 76 Patision Street, Athens 104 34, Greece;Department of Informatics, Athens University of Economics and Business, 76 Patision Street, Athens 104 34, Greece

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

Although the goal of clustering is intuitively compelling and its notion arises in many fields, it is difficult to define a unified approach to address the clustering problem and thus diverse clustering algorithms abound in the research community. These algorithms, under different clustering assumptions, often lead to qualitatively different results. As a consequence the results of clustering algorithms (i.e., data set partitionings) need to be evaluated as regards their validity based on widely accepted criteria. In this paper a cluster validity index, CDbw, is proposed which assesses the compactness and separation of clusters defined by a clustering algorithm. The cluster validity index, given a data set and a set of clustering algorithms, enables (i) the selection of the input parameter values that lead an algorithm to the best possible partitioning of the data set, and (ii) the selection of the algorithm that provides the best partitioning of the data set. CDbw handles efficiently arbitrarily shaped clusters by representing each cluster with a number of points rather than by a single representative point. A full implementation and experimental results confirm the reliability of the validity index showing also that its performance compares favourably to that of several others.