Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
An Exact Algorithm to Identify Motifs in Orthologous Sequences from Multiple Species
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A Statistical Method for Finding Transcription Factor Binding Sites
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Finding motifs in the twilight zone
Proceedings of the sixth annual international conference on Computational biology
On the Parameterized Intractability of CLOSEST SUBSTRINGsize and Related Problems
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Fast and Practical Algorithms for Planted (l, d) Motif Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
An Improved Heuristic Algorithm for Finding Motif Signals in DNA Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Randomized algorithms for motif detection
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Motif yggdrasil: sampling from a tree mixture model
RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Hi-index | 0.00 |
Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous non-coding DNA sequences from multiple species. In an earlier paper, we presented an exact algorithm that identifies the most conserved region of a set of sequences. Here, we present a number of algorithmic improvements that produce a 1000 fold speedup over the original algorithm. We also show how prior knowledge can be used to identify weaker motifs, and how to handle data sets in which only an unknown subset of the sequences contain the regulatory element. Each technique is implemented and successfully identifies a large number of known binding sites, as well as several highly conserved but uncharacterized regions.