A fast, alignment-free, conservation-based method for transcription factor binding site discovery

Authors:
Raluca Gordân;Leelavati Narlikar;Alexander J. Hartemink
Affiliations:
Department of Computer Science, Duke University, Durham, NC;Department of Computer Science, Duke University, Durham, NC;Department of Computer Science, Duke University, Durham, NC
Venue:
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Year:
2008

Citing 3
Cited 0

Informative priors based on transcription factor structural class improve de novo motif discovery

Bioinformatics
A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction

Bioinformatics
Nucleosome occupancy information improves de novo motif discovery

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for incorporating conservation information into TF motif discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It does not require sequence alignments, nor the phylogenetic relationships between the orthologous sequences, and yet it is more effective on real biological data than methods that do.