Using expression data to discover RNA and DNA regulatory sequence motifs

  • Authors:
  • Chaya Ben-Zaken Zilberstein;Eleazar Eskin;Zohar Yakhini

  • Affiliations:
  • Computer Science Dept., Technion;School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel;Agilent Laboratories and Computer Science Dept., Technion

  • Venue:
  • RRG'04 Proceedings of the 2004 RECOMB international conference on Regulatory Genomics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The combination of gene expression data and genomic sequence data can be used to help discover putative transcription factor binding sites (TFBSs). There are two major approaches to incorporating expression data into the discovery of TFBS. The first approach clusters genes according to their expression patterns. Then, over-represented sequences are sought, in the promoter regions of co-expressed genes [31, 15, 14]. A second approach uses a single expression experiment and attempts to determine which transcription factors are involved in the experiment [24, 16, 29, 12]. In this paper, we present RIM-Finder, a further development of the second approach. Our method also enables the discovery of mRNA stability motifs and motif phrases. Phrases are either single motifs (a TFBS candidate or a RNA stability motif candidate) or pairs consisting of both types of motifs and a certain logical relation between them. Our approach discovers all (potentially degenerate) phrases that are statistically significant with respect to their distribution in a ranked list of sequences under either a non-parametric model or a Student t based model. In order to allow the identification of phrases consisting of both DNA and RNA motifs we rank sequence pairs consisting of promoters and mRNA un-translated regions (UTRs). We apply RIM-FINDER to discover putative phrases using cell stress response expression, mRNA decay rate measurements and mutant expression in yeast.