Relative clustering validity criteria: A comparative overview

Authors:
Lucas Vendramin;Ricardo J. G. B. Campello;Eduardo R. Hruschka
Affiliations:
Department of Computer Sciences of the University of São Paulo at São Carlos, C.P. 668, São Carlos, Brazil;Department of Computer Sciences of the University of São Paulo at São Carlos, C.P. 668, São Carlos, Brazil;Department of Computer Sciences of the University of São Paulo at São Carlos, C.P. 668, São Carlos, Brazil
Venue:
Statistical Analysis and Data Mining
Year:
2010

Citing 0
Cited 13

Evolutionary clustering of relational data

International Journal of Hybrid Intelligent Systems - Advances in Intelligent Agent Systems
Efficiency issues of evolutionary k-means

Applied Soft Computing
Evolutionary fuzzy clustering of relational data

Theoretical Computer Science
Objective function-based clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Automatic aspect discrimination in data clustering

Pattern Recognition
Determining the number of clusters with rate-distortion curve modeling

ICIAR'12 Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part I
Relative Validity Criteria for Community Mining Algorithms

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
An ensemble clustering model for mining concept drifting stream data in emergency management

DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
Comparing relational and non-relational algorithms for clustering propositional data

Proceedings of the 28th Annual ACM Symposium on Applied Computing
On the combination of relative clustering validity criteria

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Cluster ensemble selection based on relative validity indexes

Data Mining and Knowledge Discovery
Evolutionary k-means for distributed data sets

Neurocomputing
Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many different relative clustering validity criteria exist that are very useful in practice as quantitative measures for evaluating the quality of data partitions, and new criteria have still been proposed from time to time. These criteria are endowed with particular features that may make each of them able to outperform others in specific classes of problems. In addition, they may have completely different computational requirements. Then, it is a hard task for the user to choose a specific criterion when he or she faces such a variety of possibilities. For this reason, a relevant issue within the field of clustering analysis consists of comparing the performances of existing validity criteria and, eventually, that of a new criterion to be proposed. In spite of this, the comparison paradigm traditionally adopted in the literature is subject to some conceptual limitations. The present paper describes an alternative, possibly complementary methodology for comparing clustering validity criteria and uses it to make an extensive comparison of the performances of 40 criteria over a collection of 962,928 partitions derived from five well-known clustering algorithms and 1080 different data sets of a given class of interest. A detailed review of the relative criteria under investigation is also provided that includes an original comparative asymptotic analysis of their computational complexities. This work is intended to be a complement of the classic study reported in 1985 by Milligan and Cooper as well as a thorough extension of a preliminary paper by the authors themselves. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 209-235, 2010