Weighted rank aggregation of cluster validation measures

Authors:
Vasyl Pihur;Susmita Datta;Somnath Datta
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 11

Adaptive clustering for time series: Application for identifying cell cycle expressed genes

Computational Statistics & Data Analysis
A novel measure for evaluating an ordered list: application in microRNA target prediction

ISB '10 Proceedings of the International Symposium on Biocomputing
Search computing: integrating ranked data in the life sciences

DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
An integrative approach to infer regulation programs in a transcription regulatory module network

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Chapter 15: search computing and the life sciences

Search Computing
A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Weighted Markov Chain Based Aggregation of Biomolecule Orderings

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Score based aggregation of microRNA target orderings

ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
MicroClAn: Microarray clustering analysis

Journal of Parallel and Distributed Computing
On the combination of relative clustering validity criteria

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed. Results: Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k. Availability: R code for all validation measures and rank aggregation is available from the authors upon request. Contact: somnath.datta@louisville.edu Supplementary information: Supplementary information are available at http://www.somnathdatta.org/Supp/RankCluster/supp.htm.