Build a dictionary, learn a grammar, decipher stegoscripts, and discover genomic regulatory elements

Authors:
Guandong Wang;Weixiong Zhang
Affiliations:
Department of Computer Science and Engineering, Washington University in Saint Louis, Saint Louis, MO;Department of Computer Science and Engineering, Washington University in Saint Louis, Saint Louis, MO and Department of Genetics
Venue:
RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
Year:
2005

Citing 5
Cited 0

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
A unified approach to word statistics

RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
Disappearing Cryptography: Information Hiding: Steganography and Watermarking (2nd Edition)

Disappearing Cryptography: Information Hiding: Steganography and Watermarking (2nd Edition)
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
A Statistical Method for Finding Transcription Factor Binding Sites

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has been a challenge to discover transcription factor (TF) binding motifs (TFBMs), which are short cis-regulatory DNA sequences playing essential roles in transcriptional regulation. We approach the problem of discovering TFBMs from a steganographic perspective. We view the regulatory regions of a genome as if they constituted a stegoscript with conserved words (i.e., TFBMs) being embedded in a covertext, and model the stegoscript with a statistical model consisting of a dictionary and a grammar. We develop an efficient algorithm, WordSpy, to learn such a model from a stegoscript and to recover conserved motifs. Subsequently, we select biologically meaningful motifs based on a motif's specificity to the set of genes of interest and/or the expression coherence of the genes whose promoters contain the motif. From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae, our method is able to identify all known cell-cycle related TFBMs among its top ranking motifs. Our method can also be directly applied to discriminative motif finding. By utilizing the ChIP-chip data of Lee et al., we predicted potential binding motifs of 113 known transcription factors of budding yeast.