Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites
WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree
LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Finding motifs for insufficient number of sequences with strong binding to transcription facto
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
DNA Motif Representation with Nucleotide Dependency
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Optimizing genetic algorithm for motif discovery
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.00 |
Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)-motif problem as trying to find a length-l pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)-motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as “challenging generalized problems”. We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)-motif problems which were unsolvable before as well as challenging problems of the planted (l,d)-motif problem such as (9,2), (11,3), (15,5) and (20,7)-motif problems.