Probabilistic graphical models and algorithms for genomic analysis

  • Authors:
  • Poe Xing;Richard Karp;Michael Jordan;Stuart Russell

  • Affiliations:
  • University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley

  • Venue:
  • Probabilistic graphical models and algorithms for genomic analysis
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this thesis, I discuss two probabilistic modeling problems arising in metazoan genomic analysis: identifying motifs and cis-regulatory modules (CRMs) from transcriptional regulatory sequences, and inferring haplotypes from genotypes of single nucleotide polymorphisms. Motif and CRM identification is important for understanding the gene regulatory network underlying metazoan development and functioning. I discuss a modular Bayesian model that captures rich structural characteristics of the transcriptional regulatory sequences and supports a variety of motif detection tasks. Haplotype inference is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities. I discuss a Bayesian model based on a prior distribution constructed from a Dirichlet process—a nonparametric prior which provides control over the size of the unknown pool of population haplotypes, and on a likelihood function that allows statistical errors in the haplotype/genotype relationship. Our models use the “probabilistic graphical model” formalism, a formalism that exploits the conjoined capabilities of graph theory and probability theory to build complex models out of simpler pieces. I discuss the mathematical underpinnings for the models, how they formally incorporate biological prior knowledge about the data, and I present a generalized mean field theory and a generic algorithm for approximate inference on such models.