A comprehensive validity index for clustering

  • Authors:
  • S. Saitta;B. Raphael;I. F. C. Smith

  • Affiliations:
  • (Correspd. Tel.: +41 21 693 63 72/ Fax: +41 21 693 47 48/ E-mail: sandro.saitta@gmail.com) Ecole Polytechnique Fé/dé/rale de Lausanne (EPFL), Station 18, Lausanne, Switzerland;National University of Singapore, 117566, Singapore;Ecole Polytechnique Fé/dé/rale de Lausanne (EPFL), Station 18, Lausanne, Switzerland

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster validity indices are used for both estimating the quality of a clustering algorithm and for determining the correct number of clusters in data. Even though several indices exist in the literature, most of them are only relevant for data sets that contain at least two clusters. This paper introduces a new bounded index for cluster validity called the score function (SF), a double exponential expression that is based on a ratio of standard cluster parameters. Several artificial and real-life data sets are used to evaluate the performance of the score function. These data sets contain a range of features and patterns such as unbalanced, overlapped and noisy clusters. In addition, cases involving sub-clusters and perfect clusters are tested. The score function is tested against six previously proposed validity indices. In the case of hyper-spheroidal clusters, the index proposed in this paper is found to be always as good or better than these indices. In addition, it is shown to work well on multidimensional and noisy data sets. One of its advantages is the ability to handle single cluster case and sub-cluster hierarchies.