Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Authors:
Stefano Monti;Pablo Tamayo;Jill Mesirov;Todd Golub
Affiliations:
Whitehead Institute/MIT Center for Genome Research, One Kendall Square, Cambridge, MA–02139, USA. smonti@genome.wi.mit.edu;Whitehead Institute/MIT Center for Genome Research, One Kendall Square, Cambridge, MA–02139, USA. tamayo@genome.wi.mit.edu;Whitehead Institute/MIT Center for Genome Research, One Kendall Square, Cambridge, MA–02139, USA. mesirov@genome.wi.mit.edu;Whitehead Institute/MIT Center for Genome Research, One Kendall Square, Cambridge, MA–02139, USA. golub@genome.wi.mit.edu
Venue:
Machine Learning
Year:
2003

Citing 8
Cited 107

Bootstrap technique in cluster analysis

Pattern Recognition
Algorithms for clustering data

Algorithms for clustering data
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Self-organizing maps

Self-organizing maps
Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables

Machine Learning - Special issue on learning with probabilistic representations
Class prediction and discovery using gene expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation

Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
On combining multiple clusterings

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning States and Rules for Detecting Anomalies in Time Series

Applied Intelligence
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Moderate diversity for better cluster ensembles

Information Fusion
Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data

Pattern Recognition
Performance of data resampling methods for robust class discovery based on clustering

Intelligent Data Analysis
An improved restricted growth function genetic algorithm for the consensus clustering of retinal nerve fibre data

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Cluster structure inference based on clustering stability with applications to microarray data analysis

EURASIP Journal on Applied Signal Processing
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue

Artificial Intelligence in Medicine
Consensus unsupervised feature ranking from multiple views

Pattern Recognition Letters
Definition of MV load diagrams via weighted evidence accumulation clustering using subsampling

ISPRA'07 Proceedings of the 6th WSEAS International Conference on Signal Processing, Robotics and Automation
Definition of MV load diagrams via weighted evidence accumulation clustering using subsampling

ISPRA'07 Proceedings of the 6th WSEAS International Conference on Signal Processing, Robotics and Automation
Fuzzy clustering ensemble based on mutual information

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
A consensus-driven fuzzy clustering

Pattern Recognition Letters
Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Computer Methods and Programs in Biomedicine
Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm

Pattern Recognition
Assessing agreement of clustering methods with gene expression microarray data

Computational Statistics & Data Analysis
Evolutionary tuning of SVM parameter values in multiclass problems

Neurocomputing
Detection of Gene Expressions in Microarrays by Applying Iteratively Elastic Neural Net

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II
Missing Clusters Indicate Poor Estimates or Guesses of a Proper Fuzzy Exponent

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA

ECML '07 Proceedings of the 18th European conference on Machine Learning
An Algorithm to Assess the Reliability of Hierarchical Clusters in Gene Expression Data

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Resampling-based selective clustering ensembles

Pattern Recognition Letters
Feature Selection for Clustering on High Dimensional Data

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A new method for hierarchical clustering combination

Intelligent Data Analysis
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

Artificial Intelligence in Medicine
Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Enhanced bisecting k-means clustering using intermediate cooperation

Pattern Recognition
Nonnegative Decompositions with Resampling for Improving Gene Expression Data Biclustering Stability

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Multi-objective clustering ensemble for gene expression data analysis

Neurocomputing
Use of Classification Algorithms in Noise Detection and Elimination

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Unsupervised Stability-Based Ensembles to Discover Reliable Structures in Complex Bio-molecular Data

Computational Intelligence Methods for Bioinformatics and Biostatistics
Stability and Performances in Biclustering Algorithms

Computational Intelligence Methods for Bioinformatics and Biostatistics
A Fast Approximation Algorithm for the k Partition-Distance Problem

ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
A permutation test for determining significance of clusters with applications to spatial and gene expression data

Computational Statistics & Data Analysis
Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets

BSB '09 Proceedings of the 4th Brazilian Symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
A stability-based algorithm to validate hierarchical clusters of genes

International Journal of Knowledge Engineering and Soft Data Paradigms
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

Artificial Intelligence in Medicine
Microarray analysis of autoimmune diseases by machine learning procedures

IEEE Transactions on Information Technology in Biomedicine
Tumor clustering using nonnegative matrix factorization with gene selection

IEEE Transactions on Information Technology in Biomedicine - Special section on biomedical informatics
Music clustering with features from different information sources

IEEE Transactions on Multimedia - Special section on communities and media computing
Morphometric subtyping for a panel of breast cancer cell lines

ISBI'09 Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro
Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients

Computers in Biology and Medicine
Hybrid sampling on mutual information entropy-based clustering ensembles for optimizations

Neurocomputing
Selecting diversifying heuristics for cluster ensembles

MCS'07 Proceedings of the 7th international conference on Multiple classifier systems
Clustering ensembles based on normalized edges

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Discovering significant structures in clustered bio-molecular data through the bernstein inequality

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations

IEEE Transactions on Fuzzy Systems
Solving selected classification problems in bioinformatics using multilayer neural network based on multi-valued neurons (MLMVN)

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Molecular cancer class discovery using non-negative matrix factorization with sparseness constraint

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
A new efficient approach in clustering ensembles

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Experiments for the number of clusters in K-means

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Automatic malware categorization using cluster ensemble

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning multiple nonredundant clusterings

ACM Transactions on Knowledge Discovery from Data (TKDD)
On combining multiple clusterings: an overview and a new perspective

Applied Intelligence
Partitions selection strategy for set of clustering solutions

Neurocomputing
Ensemble clustering in the belief functions framework

International Journal of Approximate Reasoning
Semi-supervised approach for finding cancer sub-classes on gene expression data

BSB'10 Proceedings of the Advances in bioinformatics and computational biology, and 5th Brazilian conference on Bioinformatics
On biological validity indices for soft clustering algorithms for gene expression data

Computational Statistics & Data Analysis
PSO driven collaborative clustering: A clustering algorithm for ubiquitous environments

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

The Journal of Machine Learning Research
Average parameterization and partial kernelization for computing medians

Journal of Computer and System Sciences
Multiple-kernel SVM based multiple-task oriented data mining system for gene expression data analysis

Expert Systems with Applications: An International Journal
A general stochastic clustering method for automatic cluster discovery

Pattern Recognition
The three steps of clustering in the post-genomic era: a synopsis

CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Improved gene expression clustering with the parameter-free PKNNG metric

BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology
A hybrid ensemble approach for the Steiner tree problem in large graphs: A geographical application

Applied Soft Computing
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Pattern Recognition
Subspace metric ensembles for semi-supervised clustering of high dimensional data

ECML'06 Proceedings of the 17th European conference on Machine Learning
Gene selection using rough set theory

RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
Individual clustering and homogeneous cluster ensemble approaches applied to gene expression data

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Hybrid cluster ensemble framework based on the random combination of data transformation operators

Pattern Recognition
Average parameterization and partial kernelization for computing medians

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Evaluation of the contents of partitions obtained with clustering gene expression data

BSB'05 Proceedings of the 2005 Brazilian conference on Advances in Bioinformatics and Computational Biology
Generalized Adjusted Rand Indices for cluster ensembles

Pattern Recognition
Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis

Theoretical Computer Science
A multi-objective sequential ensemble for cluster structure analysis and visualization and application to gene expression

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
The instance easiness of supervised learning for cluster validity

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
On the parameterized complexity of consensus clustering

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Identification of breast cancer subtypes using multiple gene expression microarray datasets

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Cluster ensembles via weighted graph regularized nonnegative matrix factorization

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
From cluster ensemble to structure ensemble

Information Sciences: an International Journal
Community detection via heterogeneous interaction analysis

Data Mining and Knowledge Discovery
ReinSel: A class-based mechanism for feature selection in ensemble of classifiers

Applied Soft Computing
Text categorization using an ensemble classifier based on a mean co-association matrix

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Pattern discovery from patient controlled analgesia demand behavior

Computers in Biology and Medicine
Ensemble approaches for regression: A survey

ACM Computing Surveys (CSUR)
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Evaluating unsupervised ensembles when applied to word sense induction

ACL '12 Proceedings of ACL 2012 Student Research Workshop
Stability-based model selection for high throughput genomic data: an algorithmic paradigm

ICARIS'12 Proceedings of the 11th international conference on Artificial Immune Systems
SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Comparing Clustering Techniques for Real Microarray Data

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
A quantifier-based fuzzy classification system for breast cancer patients

Artificial Intelligence in Medicine
How to "alternatize" a clustering algorithm

Data Mining and Knowledge Discovery
Cluster ensemble selection based on relative validity indexes

Data Mining and Knowledge Discovery
Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures

International Journal of High Performance Computing Applications
How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A self-supervised framework for clustering ensemble

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Correlation clustering with stochastic labellings

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
A theoretic framework of K-means-based consensus clustering

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Pairwise similarity for cluster ensemble problem: link-based and approximate approaches

Transactions on Large-Scale Data- and Knowledge-centered systems IX
Effects of resampling method and adaptation on clustering ensemble efficacy

Artificial Intelligence Review
Subspace clustering of high-dimensional data: a predictive approach

Data Mining and Knowledge Discovery
Estimating the predominant number of clusters in a dataset

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.