Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Hi-index | 0.00 |
We present new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l, d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt^2 l^d \left| \sum\right|^d ), where t is the number of input sequences, and n is the length of each input sequence. The first algorithm that we present takes O(n^2 t^2 l^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}}\left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ) and space 0(ntl^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} \left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ), and the second algorithm takes 0(n^3 t^3 l^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} \left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ) time using 0(l^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} \left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ) space. In practice, our algorithms have much better performance provided the d/l ratio is small.