PDC: pattern discovery with confidence in DNA sequences

  • Authors:
  • Yi Lu;Shiyong Lu;Farshad Fotouhi;Yan Sun;Zijiang Yang;Lily R. Liang

  • Affiliations:
  • Department of Computer Science, Wayne State University, Detroit, MI;Department of Computer Science, Wayne State University, Detroit, MI;Department of Computer Science, Wayne State University, Detroit, MI;Department of Computer Science, Wayne State University, Detroit, MI;Department of Computer Science, Western Michigan University, Kalamazoo, MI;Department of Computer Science, University of the District of Columbia, Washington, DC

  • Venue:
  • ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pattern discovery in DNA sequences is one of the most challenging tasks in molecular biology and computer science. The main goal of pattern discovery in DNA sequences is to identify sequences of important biological function hidden in the huge amounts of genomic sequences. Several methods and techniques have been proposed and implemented in this field. However, in order to reduce computational time and complexity, most of them either focus on finding short DNA patterns or require explicit specification of pattern lengths in advance. Scientists need to find longer patterns without specifying pattern lengths in advance and still have good performance.In this paper, we propose a pattern discovery algorithm called Pattern Discovery with Confidence (PDC). Based on biological studies, we propose a new measurement system that can identify overrepresented patterns inside DNA sequences. Using this measurement, PDC algorithm can narrow the search space by checking dependency along the pattern, thus extending the pattern as long as possible without the need to restrict or specify the length of a pattern in advance. Experimental tests demonstrate that this approach can find long, interesting patterns within a reasonable computation time.