Clustering microarray gene expression data using weighted Chinese restaurant process

Authors:
Zhaohui S. Qin
Affiliations:
Center for Statistical Genetics, Department of Biostatistics, School of Public Health, University of Michigan 1420 Washington Heights, Ann Arbor, MI 48109-2029, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 15

Methodological Review: Towards knowledge-based gene expression data mining

Journal of Biomedical Informatics
Network-Based Inference of Cancer Progression from Microarray Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Towards improving fuzzy clustering using support vector machine: Application to gene expression data

Pattern Recognition
Mining the Largest Dense Vertexlet in a Weighted Scale-free Graph

Fundamenta Informaticae
Modeling and Visualizing Uncertainty in Gene Expression Clusters Using Dirichlet Process Mixtures

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Analysis of microarray data using multiobjective variable string length genetic fuzzy clustering

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Network-based inference of cancer progression from microarray data

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Nonparametric combinatorial sequence models

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization

Engineering Applications of Artificial Intelligence
Efficient two dimensional clustering of microarray gene expression data by means of hybrid similarity measure

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Robust Bayesian Clustering for Replicated Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On two-way Bayesian agglomerative clustering of gene expression data

Statistical Analysis and Data Mining
A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data

Frontiers of Computer Science: Selected Publications from Chinese Universities
Gene expression data clustering using a multiobjective symmetry based clustering technique

Computers in Biology and Medicine
Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Clustering microarray gene expression data is a powerful tool for elucidating co-regulatory relationships among genes. Many different clustering techniques have been successfully applied and the results are promising. However, substantial fluctuation contained in microarray data, lack of knowledge on the number of clusters and complex regulatory mechanisms underlying biological systems make the clustering problems tremendously challenging. Results: We devised an improved model-based Bayesian approach to cluster microarray gene expression data. Cluster assignment is carried out by an iterative weighted Chinese restaurant seating scheme such that the optimal number of clusters can be determined simultaneously with cluster assignment. The predictive updating technique was applied to improve the efficiency of the Gibbs sampler. An additional step is added during reassignment to allow genes that display complex correlation relationships such as time-shifted and/or inverted to be clustered together. Analysis done on a real dataset showed that as much as 30% of significant genes clustered in the same group display complex relationships with the consensus pattern of the cluster. Other notable features including automatic handling of missing data, quantitative measures of cluster strength and assignment confidence. Synthetic and real microarray gene expression datasets were analyzed to demonstrate its performance. Availability: A computer program named Chinese restaurant cluster (CRC) has been developed based on this algorithm. The program can be downloaded at http://www.sph.umich.edu/csg/qin/CRC/ Contact: qin@umich.edu Supplementary information: http://www.sph.umich.edu/csg/qin/CRC/