Bayesian mixture model based clustering of replicated microarray data

Authors:
M. Medvedovic;K.Y. Yeung;R.E. Bumgarner
Affiliations:
Department of Environmental Health, Center for Genome Information, University of Cincinnati Medical Center, 3223 Eden Avenue ML 56, Cincinnati, OH 45267-0056, USA;Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA;Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 17

WaveRead: automatic measurement of relative gene expression levels from microarrays using wavelet analysis

Journal of Biomedical Informatics
Dynamic agglomerative clustering of gene expression profiles

Pattern Recognition Letters
Use of SVD-based probit transformation in clustering gene expression profiles

Computational Statistics & Data Analysis
A Customized Class of Functions for Modeling and Clustering Gene Expression Profiles in Embryonic Stem Cells

BSB '08 Proceedings of the 3rd Brazilian symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
Mining aggregates of over-the-counter products for syndromic surveillance

Pattern Recognition Letters
Gene Clustering via Integrated Markov Models Combining Individual and Pairwise Features

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Estimating the number of clusters via system evolution for cluster analysis of gene expression data

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Inferential Clustering Approach for Microarray Experiments with Replicated Measurements

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Modeling and Visualizing Uncertainty in Gene Expression Clusters Using Dirichlet Process Mixtures

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Matrix factorisation methods applied in microarray data analysis

International Journal of Data Mining and Bioinformatics
Similarity analysis in Bayesian random partition models

Computational Statistics & Data Analysis
Characterizing cell types through differentially expressed gene clusters using a model-based approach

ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
A new test system for stability measurement of marker gene selection in DNA microarray data analysis

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Gibbs sampler-based coordination of autonomous swarms

Automatica (Journal of IFAC)
Robust Bayesian Clustering for Replicated Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On two-way Bayesian agglomerative clustering of gene expression data

Statistical Analysis and Data Mining
Feature selection based on cluster and variability analyses for ordinal multi-class classification problems

Knowledge-Based Systems

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. Results: We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. Availability: The MS Windows™ based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm Supplemental information: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html