Incorporating gene functions as priors in model-based clustering of microarray gene expression data

Authors:
Wei Pan
Affiliations:
Division of Biostatistics, MMC 303, School of Public Health, University of Minnesota Minneapolis, MN 55455-0392, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 7

Methodological Review: Towards knowledge-based gene expression data mining

Journal of Biomedical Informatics
Incorporating Gene Functions into Regression Analysis of DNA-Protein Binding Data and Gene Expression Data to Construct Transcriptional Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Gene Ontology Assisted Exploratory Microarray Clustering and Its Application to Cancer

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Fuzzy c-means clustering with prior biological knowledge

Journal of Biomedical Informatics
Formulating and testing hypotheses in functional genomics

Artificial Intelligence in Medicine
Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data

Computational Statistics & Data Analysis
Using Gene Ontology annotations in exploratory microarray clustering to understand cancer etiology

Pattern Recognition Letters

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Cluster analysis of gene expression profiles has been widely applied to clustering genes for gene function discovery. Many approaches have been proposed. The rationale is that the genes with the same biological function or involved in the same biological process are more likely to co-express, hence they are more likely to form a cluster with similar gene expression patterns. However, most existing methods, including model-based clustering, ignore known gene functions in clustering. Results: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions as prior probabilities in model-based clustering. In contrast to a global mixture model applicable to all the genes in the standard model-based clustering, we use a stratified mixture model: one stratum corresponds to the genes of unknown function while each of the other ones corresponding to the genes sharing the same biological function or pathway; the genes from the same stratum are assumed to have the same prior probability of coming from a cluster while those from different strata are allowed to have different prior probabilities of coming from the same cluster. We derive a simple EM algorithm that can be used to fit the stratified model. A simulation study and an application to gene function prediction demonstrate the advantage of our proposal over the standard method. Contact: weip@biostat.umn.edu