Fuzzy ensemble clustering based on random projections for DNA microarray data analysis
Artificial Intelligence in Medicine
Information theoretic measures for clusterings comparison: is a correction for chance necessary?
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Fast Approximation Algorithm for the k Partition-Distance Problem
ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
Comparing fuzzy, probabilistic, and possibilistic partitions
IEEE Transactions on Fuzzy Systems
The Journal of Machine Learning Research
DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Generalized Adjusted Rand Indices for cluster ensembles
Pattern Recognition
The instance easiness of supervised learning for cluster validity
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
From cluster ensemble to structure ensemble
Information Sciences: an International Journal
Robust Bayesian Clustering for Replicated Gene Expression Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
New cluster ensemble approach to integrative biological data analysis
International Journal of Data Mining and Bioinformatics
Review article: Computational intelligence techniques in bioinformatics
Computational Biology and Chemistry
Hi-index | 3.84 |
Motivation: Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data. Results: In addition to exploring a graph-based consensus clustering (GCC) algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which GCC is applied to class discovery for microarray data. Given a pre specified maximum number of classes (denoted as Kmax in this article), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning. Availability: Matlab source code for the GCC algorithm is available upon request from Zhiwen Yu. Contact:yuzhiwen@cs.cityu.edu.hk and cshswong@cityu.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online.