Build a dictionary, learn a grammar, decipher stegoscripts, and discover genomic regulatory elements

  • Authors:
  • Guandong Wang;Weixiong Zhang

  • Affiliations:
  • Department of Computer Science and Engineering, Washington University in Saint Louis, Saint Louis, MO;Department of Computer Science and Engineering, Washington University in Saint Louis, Saint Louis, MO and Department of Genetics

  • Venue:
  • RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

It has been a challenge to discover transcription factor (TF) binding motifs (TFBMs), which are short cis-regulatory DNA sequences playing essential roles in transcriptional regulation. We approach the problem of discovering TFBMs from a steganographic perspective. We view the regulatory regions of a genome as if they constituted a stegoscript with conserved words (i.e., TFBMs) being embedded in a covertext, and model the stegoscript with a statistical model consisting of a dictionary and a grammar. We develop an efficient algorithm, WordSpy, to learn such a model from a stegoscript and to recover conserved motifs. Subsequently, we select biologically meaningful motifs based on a motif's specificity to the set of genes of interest and/or the expression coherence of the genes whose promoters contain the motif. From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae, our method is able to identify all known cell-cycle related TFBMs among its top ranking motifs. Our method can also be directly applied to discriminative motif finding. By utilizing the ChIP-chip data of Lee et al., we predicted potential binding motifs of 113 known transcription factors of budding yeast.