Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Finding similar regions in many strings
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Distinguishing string selection problems
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On the closest string and substring problems
Journal of the ACM (JACM)
Finding similar regions in many sequences
Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Randomized algorithms for motif detection
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Detecting Motifs in a Large Data Set: Applying Probabilistic Insights to Motif Finding
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Hi-index | 0.00 |
Motivation: Motif identification for sequences has many important applications in biological studies, e.g., diagnostic probe design, locating binding sites and regulatory signals, and potential drug target identification. There are two versions. 1. Single Group: Given a group of n sequences, find a length-l motif that appears in each of the given sequences and those occurrences of the motif are similar. 2. Two Groups: Given two groups of sequences B and G, find a length-l (distinguishing) motif that appears in every sequence in B and does not appear in anywhere of the sequences in G. Here the occurrences of the motif in the given sequences have errors. Currently, most of existing programs can only handle the case of single group. Moreover, it is very difficult to use edit distance (allowing indels and replacements) for motif detection. Results: (1) We propose a randomized algorithm for the one group problem that can handle indels in the occurrences of the motif. (2) We give an algorithm for the two groups problem. (3) Extensive simulations have been done to evaluate the algorithms.