Mining poly-regions in DNA

  • Authors:
  • Panagiotis Papapetrou;Gary Benson;George Kollios

  • Affiliations:
  • Department of Information and Computer Science, Aalto University 00076, Finland;Departments of Biology and Computer Science, Boston University, MA 02215, USA;Computer Science Department, Boston University, MA 02215, USA

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of mining poly-regions in DNA. A poly-region is defined as a bursty DNA area, i.e., area of elevated frequency of a DNA pattern. We introduce a general formulation that covers a range of meaningful types of poly-regions and develop three efficient detection methods. The first applies recursive segmentation and is entropy-based. The second uses a set of sliding windows that summarize each sequence segment using several statistics. Finally, the third employs a technique based on majority vote. The proposed algorithms are tested on DNA sequences of four different organisms in terms of recall and runtime.