Minimum interval cover and its application to genome sequencing

  • Authors:
  • Liang Ding;Bin Fu;Binhai Zhu

  • Affiliations:
  • Department of Computer Science University of Texas-Pan American, Edinburg, TX;Department of Computer Science University of Texas-Pan American, Edinburg, TX;Department of Computer Science Montana State University, Bozeman, MT

  • Venue:
  • COCOA'11 Proceedings of the 5th international conference on Combinatorial optimization and applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.03

Visualization

Abstract

Pairwise end sequencing is a very useful method for whole genome sequencing which determines the complete DNA sequence of an organism's genome with the help with laboratory processes. Pairedend interval cover problem is derived from pairwise end sequencing. A paired-end interval for a sequence S is constituted of at most two disjoint intervals, and the paired-end interval cover problem can be described as given a family F of paired-end intervals, find the least number of pairedend intervals of F to cover S. We prove that the paired-end interval cover problem is NP-complete. The c-interval cover problem is a generalization of paired-end interval cover that allows each member of the family F to have at most c disjoint intervals. It extends the classical set-cover problem reasonably. We show that the problem is APX-hard when c ≤ 3. For solving these problems, we present a polynomial-time 6c-approximation algorithm for the c-interval cover problem and a fixed parameter tractable algorithm for the k-bounded c-interval cover problem. Our implementation results show that the approximation ratio is much smaller than the theoretical bound for most real examples.