Estimation of the number of clusters using multiple clustering validity indices

  • Authors:
  • Krzysztof Kryszczuk;Paul Hurley

  • Affiliations:
  • IBM Zurich Research Laboratory, Switzerland;IBM Zurich Research Laboratory, Switzerland

  • Venue:
  • MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the challenges in unsupervised machine learning is finding the number of clusters in a dataset. Clustering Validity Indices (CVI) are popular tools used to address this problem. A large number of CVIs have been proposed, and reports that compare different CVIs suggest that no single CVI can always outperform others. Following suggestions found in prior art, in this paper we formalize the concept of using multiple CVIs for cluster number estimation in the framework of multi-classifier fusion. Using a large number of datasets, we show that decision-level fusion of multiple CVIs can lead to significant gains in accuracy in estimating the number of clusters, in particular for high-dimensional datasets with large number of clusters.