A Bounded Index for Cluster Validity

  • Authors:
  • Sandro Saitta;Benny Raphael;Ian F. Smith

  • Affiliations:
  • Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 18, 1015 Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 18, 1015 Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 18, 1015 Lausanne, Switzerland

  • Venue:
  • MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multi-dimensional data sets and is able to accommodate unique and sub-cluster cases.