Journal of Biomedical Informatics
Dynamic agglomerative clustering of gene expression profiles
Pattern Recognition Letters
Use of SVD-based probit transformation in clustering gene expression profiles
Computational Statistics & Data Analysis
BSB '08 Proceedings of the 3rd Brazilian symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
Mining aggregates of over-the-counter products for syndromic surveillance
Pattern Recognition Letters
Gene Clustering via Integrated Markov Models Combining Individual and Pairwise Features
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Estimating the number of clusters via system evolution for cluster analysis of gene expression data
IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Inferential Clustering Approach for Microarray Experiments with Replicated Measurements
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Modeling and Visualizing Uncertainty in Gene Expression Clusters Using Dirichlet Process Mixtures
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Matrix factorisation methods applied in microarray data analysis
International Journal of Data Mining and Bioinformatics
Similarity analysis in Bayesian random partition models
Computational Statistics & Data Analysis
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
A new test system for stability measurement of marker gene selection in DNA microarray data analysis
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Gibbs sampler-based coordination of autonomous swarms
Automatica (Journal of IFAC)
Robust Bayesian Clustering for Replicated Gene Expression Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On two-way Bayesian agglomerative clustering of gene expression data
Statistical Analysis and Data Mining
Hi-index | 3.84 |
Motivation: Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. Results: We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. Availability: The MS Windows™ based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm Supplemental information: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html