Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Distinguishing string selection problems
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Algorithms for phylogenetic footprinting
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
On the closest string and substring problems
Journal of the ACM (JACM)
Finding similar regions in many sequences
Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Identification of distinguishing motifs
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Motivation: Motif detection for DNA sequences has many important applications in biological studies, e.g., locating binding sites and regulatory signals, and designing genetic probes etc In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software. Results: (1) We design a randomized algorithm for consensus pattern problem We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most ε × l for each string, where l is the length of the motif and ε can be any positive number given by the user (2) We design an improved EM (Expectation Maximization) algorithm that outperforms the original EM algorithm (3) We develop a software MotifDetector that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search We compare MotifDetector with Buhler and Tompa's PROJECTION which is considered to be the best known software for motif detection Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large. Availability: Free from http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.