Graph-based consensus clustering for class discovery from gene expression data

Authors:
Zhiwen Yu;Hau-San Wong;Hongqiang Wang
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 14

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

Artificial Intelligence in Medicine
Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Fast Approximation Algorithm for the k Partition-Distance Problem

ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
A graph-theoretical clustering method based on two rounds of minimum spanning trees

Pattern Recognition
Comparing fuzzy, probabilistic, and possibilistic partitions

IEEE Transactions on Fuzzy Systems
Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

The Journal of Machine Learning Research
Hybrid cluster ensemble framework based on the random combination of data transformation operators

Pattern Recognition
DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Generalized Adjusted Rand Indices for cluster ensembles

Pattern Recognition
The instance easiness of supervised learning for cluster validity

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
From cluster ensemble to structure ensemble

Information Sciences: an International Journal
Robust Bayesian Clustering for Replicated Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
New cluster ensemble approach to integrative biological data analysis

International Journal of Data Mining and Bioinformatics
Review article: Computational intelligence techniques in bioinformatics

Computational Biology and Chemistry

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data. Results: In addition to exploring a graph-based consensus clustering (GCC) algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which GCC is applied to class discovery for microarray data. Given a pre specified maximum number of classes (denoted as Kmax in this article), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning. Availability: Matlab source code for the GCC algorithm is available upon request from Zhiwen Yu. Contact:yuzhiwen@cs.cityu.edu.hk and cshswong@cityu.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online.