Estimation of the number of clusters using multiple clustering validity indices

Authors:
Krzysztof Kryszczuk;Paul Hurley
Affiliations:
IBM Zurich Research Laboratory, Switzerland;IBM Zurich Research Laboratory, Switzerland
Venue:
MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Year:
2010

Citing 14
Cited 1

Algorithms for clustering data

Algorithms for clustering data
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Cluster validity methods: part I

ACM SIGMOD Record
A Model-Fitting Approach to Cluster Validation with Application to Stochastic Model-Based Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Stability-based validation of clustering solutions

Neural Computation
A Comparison Study of Cluster Validity Indices Using a Nonhierarchical Clustering Algorithm

CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-1 (CIMCA-IAWTIC'06) - Volume 01
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Design of OBF-TS Fuzzy Models Based on Multiple Clustering Validity Criteria

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Validating synthetic health datasets for longitudinal clustering

HIKM '13 Proceedings of the Sixth Australasian Workshop on Health Informatics and Knowledge Management - Volume 142

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the challenges in unsupervised machine learning is finding the number of clusters in a dataset. Clustering Validity Indices (CVI) are popular tools used to address this problem. A large number of CVIs have been proposed, and reports that compare different CVIs suggest that no single CVI can always outperform others. Following suggestions found in prior art, in this paper we formalize the concept of using multiple CVIs for cluster number estimation in the framework of multi-classifier fusion. Using a large number of datasets, we show that decision-level fusion of multiple CVIs can lead to significant gains in accuracy in estimating the number of clusters, in particular for high-dimensional datasets with large number of clusters.