A Mixture model with random-effects components for clustering correlated gene-expression profiles

Authors:
S. K. Ng;G. J. Mclachlan;K. Wang;L. Ben-Tovim Jones;S.-W. Ng
Affiliations:
Department of Mathematics, University of Queensland Brisbane, QLD 4072, Australia;Department of Mathematics, University of Queensland Brisbane, QLD 4072, Australia;ARC Centre for Complex Systems, University of Queensland Brisbane, QLD 4072, Australia;Institute for Molecular Bioscience, University of Queensland Brisbane, QLD 4072, Australia;Laboratory of Gynecologic Oncology, Department of Obstetrics, Gynecology and Reproductive Biology Brigham and Women's Hospital, Boston, MA 02115, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 7

Clustering replicated microarray data via mixtures of random effects models for various covariance structures

WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
Two-way analysis of high-dimensional collinear data

Data Mining and Knowledge Discovery
A GMM-IG framework for selecting genes as expression panel biomarkers

Artificial Intelligence in Medicine
Microarray Time Course Experiments: Finding Profiles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis

Computational Statistics & Data Analysis
Mixture models for clustering multilevel growth trajectories

Computational Statistics & Data Analysis
Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data

Computational Statistics & Data Analysis

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation)and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too. Availability: A Fortran program blue called EMMIX-WIRE (EM-based MIXture analysis WIth Random Effects) is available on request from the corresponding author. Contact: gjm@maths.uq.edu.au Supplementary information:http://www.maths.uq.edu.au/~gjm/bioinf0602_supp.pdf. Colour versions of Figures 1 and 2 are available as Supplementary material on Bioinformatics online.