Physical network models and multi-source data integration

  • Authors:
  • Chen-Hsiang Yeang;Tommi Jaakkola

  • Affiliations:
  • MIT AI Lab, Cambridge, MA;MIT AI Lab, Cambridge, MA

  • Venue:
  • RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop a new framework for inferring models of transcriptional regulation. The models in this approach, which we call physical models, are constructed on the basis of verifiable molecular attributes of the underlying biological system. The attributes include, for example, the existence of protein-protein and protein-DNA interactions in gene regulatory processes, the directionality of signal transduction in protein-protein interactions, as well as the signs of the immediate effects of these interactions (e.g., whether an upstream gen activates or represses the downstream genes). Each attribute is included as a variable in the model, and the variables define a collection of annotated random graphs. Possible configurations of these variables (realizations of the underlying biological system) are constrained by the available data sources. Some of the data sources such as factor-binding data (location data) involve measurements that are directly tied to the variables in the model. Other sources such as gene knock-outs are functional in nature and provide only indirect evidence about the (physical) variables. We associate each knock-out effect in the deletion mutant data with a set of causal paths (molecular cascades) that could in principle explain the effect, resulting in aggregate constraints about the physical variables in the model. The most likely setting of all the variables is found by the max-product algorithm. By testing our approach on datasets related to the pheromone response pathway in S. cerevisiae, we demonstrate that the resulting transcriptional models are consistent with previous studies about the pathway. Moreover, we show that the approach is capable of predicting gene knock-out effects with high degree of accuracy in a cross-validation setting. The method also implicates likely molecular cascades responsible for each observed knock-out effect. The inference results are robust against variations in the model parameters. We can extend the approach to include other data sources such as time course expression profiles. We also discuss coordinated regulation and the use of automated experiment design