On the combination of relative clustering validity criteria

Authors:
Lucas Vendramin;Pablo A. Jaskowiak;Ricardo J. G. B. Campello
Affiliations:
Universidade de São Paulo (USP), São Carlos, São Paulo, Brazil;Universidade de São Paulo (USP), São Carlos, São Paulo, Brazil;Universidade de São Paulo (USP), São Carlos, São Paulo, Brazil
Venue:
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Year:
2013

Citing 25
Cited 1

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Algorithms for clustering data

Algorithms for clustering data
Data clustering: a review

ACM Computing Surveys (CSUR)
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Cluster validation techniques for genome expression data

Signal Processing - Special issue: Genomic signal processing
The Amsterdam Library of Object Images

International Journal of Computer Vision
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Converting Output Scores from Outlier Detection Algorithms into Probability Estimates

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Design of OBF-TS Fuzzy Models Based on Multiple Clustering Validity Criteria

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Weighted rank aggregation of cluster validation measures

Bioinformatics
On comparing two sequences of numbers and its applications to clustering analysis

Information Sciences: an International Journal
Cluster Analysis

Cluster Analysis
A Combination Approach to Cluster Validation Based on Statistical Quantiles

IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
Ensemble-based classifiers

Artificial Intelligence Review
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Relative clustering validity criteria: A comparative overview

Statistical Analysis and Data Mining
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining outliers with ensemble of heterogeneous detectors on random subspaces

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Evolving clusters in gene-expression data

Information Sciences: an International Journal
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A weighted sum validity function for clustering with a hybrid niching genetic algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Survey of clustering algorithms

IEEE Transactions on Neural Networks
Automatic aspect discrimination in data clustering

Pattern Recognition
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Relative Validity Criteria for Community Mining Algorithms

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific classes of problems. Nevertheless, the performance of each criterion is usually unknown a priori by the user. Hence, choosing a specific criterion is not a trivial task. A possible approach to circumvent this drawback consists of combining different relative criteria in order to obtain more robust evaluations. However, this approach has so far been applied in an ad-hoc fashion only; its real potential is actually not well-understood. In this paper, we present an extensive study on the combination of relative criteria considering both synthetic and real datasets. The experiments involved 28 criteria and 4 different combination strategies applied to a varied collection of data partitions produced by 5 clustering algorithms. In total, 427,680 partitions of 972 synthetic datasets and 14,000 partitions of a collection of 400 image datasets were considered. Based on the results, we discuss the shortcomings and possible benefits of combining different relative criteria into a committee.