Efficient algorithms for model-based motif discovery from multiple sequences

Authors:
Bin Fu;Ming-Yang Kao;Lusheng Wang
Affiliations:
Dept. of Computer Science, University of Texas-Pan American, TX;Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL;Department of Computer Science, The City University of Hong Kong, Kowloon, Hong Kong
Venue:
TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
Year:
2008

Citing 7
Cited 1

Randomized algorithms

Randomized algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Finding similar regions in many strings

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Distinguishing string selection problems

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Efficient approximation algorithms for the Hamming center problem

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On the closest string and substring problems

Journal of the ACM (JACM)
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

Discovering Almost Any Hidden Motif from Multiple Sequences in Polynomial Time with Low Sample Complexity and High Success Probability

TAMC '09 Proceedings of the 6th Annual Conference on Theory and Applications of Models of Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In thismodel, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1g2...gm is a string of m characters. Each background sequence is implanted a randomly generated approximate copy of G. For a randomly generated approximate copy b1b2...bm of G, every character is randomly generated such that the probability for bi ≠ gi is at most α. In this paper, we give the first analytical proof that multiple background sequences do help for finding subtle and faint motifs.