Filling scaffolds with gene repetitions: maximizing the number of adjacencies

  • Authors:
  • Haitao Jiang;Farong Zhong;Binhai Zhu

  • Affiliations:
  • Department of Computer Science, Montana State University, Bozeman, MT and School of Computer Science and Technology, Shandong University, Jinan, China;College of Computing, Zhejiang Normal University, Jinhua, Zhejiang, China;Department of Computer Science, Montana State University, Bozeman, MT

  • Venue:
  • CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In genome sequencing there is a trend not to complete the sequence of the whole genomes. Motivated by this Muñoz et al. recently studied the (one-sided) problem of filling an incomplete multichromosomal genome (or scaffold) H with respect to a complete target genome C such that the resulting genomic (or double-cut-and-join, DCJ for short) distance between H′ and C is minimized, where H′ is the corresponding filled scaffold. Jiang et al. recently extended this result to both the breakpoint distance and the DCJ distance and to the (two-sided) case when even C has some missing genes, and solved all these problems in polynomial time. However, when H and C contain duplicated genes, the corresponding breakpoint distance problem becomes NP-complete and there has been no efficient approximation or FPT algorithms for it. In this paper, we mainly consider the one-sided problem of filling scaffolds with gene repetitions so as to maximize the number of adjacencies between the two resulting sequences; namely, given an incomplete genome I and a complete genome G, both with gene repetitions, fill in the missing genes to obtain I′ such that the number of adjacencies between I′ and G is maximized. We prove that this problem is also NP-complete and present an efficient 1.33-approximation for the problem. The hardness result also holds for the two-sided problem for which a trivial factor-2 approximation exists. We also present FPT algorithms for some special cases of this problem.