PRUNER: Algorithms for Finding Monad Patterns in DNA Sequences

  • Authors:
  • Ravi VijayaSatya;Amar Mukherjee

  • Affiliations:
  • University of Central Florida;University of Central Florida

  • Venue:
  • CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l, d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt^2 l^d \left| \sum\right|^d ), where t is the number of input sequences, and n is the length of each input sequence. The first algorithm that we present takes O(n^2 t^2 l^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}}\left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ) and space 0(ntl^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} \left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ), and the second algorithm takes 0(n^3 t^3 l^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} \left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ) time using 0(l^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} \left| \Sigma\right|^{{d \mathord{\left/ {\vphantom {d 2}} \right. \kern-\nulldelimiterspace} 2}} ) space. In practice, our algorithms have much better performance provided the d/l ratio is small.