Motif yggdrasil: sampling from a tree mixture model

  • Authors:
  • Samuel A. Andersson;Jens Lagergren

  • Affiliations:
  • Stockholm Bioinformatics Center and School of Computer Science and Communication, Royal Institute of Technology, Stockholm, Sweden;Stockholm Bioinformatics Center and School of Computer Science and Communication, Royal Institute of Technology, Stockholm, Sweden

  • Venue:
  • RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.