Gene Clustering via Integrated Markov Models Combining Individual and Pairwise Features

Authors:
Matthieu Vignes;Florence Forbes
Affiliations:
-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2009

Citing 5
Cited 1

Gene functional classification from heterogeneous data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Hidden Markov Random Field Model Selection Criteria Based on Mean Field-Like Approximations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bayesian mixture model based clustering of replicated microarray data

Bioinformatics
A statistical framework for genomic data fusion

Bioinformatics
Class-Specific subspace discriminant analysis for high-dimensional data

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection

A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering of genes into groups sharing common characteristics is a useful exploratory technique for a number of subsequent computational analysis. A wide range of clustering algorithms have been proposed in particular to analyze gene expression data, but most of them consider genes as independent entities or include relevant information on gene interactions in a suboptimal way. We propose a probabilistic model that has the advantage to account for individual data (e.g., expression) and pairwise data (e.g., interaction information coming from biological networks) simultaneously. Our model is based on hidden Markov random field models in which parametric probability distributions account for the distribution of individual data. Data on pairs, possibly reflecting distance or similarity measures between genes, are then included through a graph, where the nodes represent the genes, and the edges are weighted according to the available interaction information. As a probabilistic model, this model has many interesting theoretical features. In addition, preliminary experiments on simulated and real data show promising results and points out the gain in using such an approach. Availability: The software used in this work is written in C++ and is available with other supplementary material at http://mistis.inrialpes.fr/people/forbes/transparentia/supplementary.html.