Scalable genome scaffolding using integer linear programming

  • Authors:
  • James Lindsay;Hamed Salooti;Alex Zelikovsky;Ion Măndoiu

  • Affiliations:
  • University of Connecticut, Storrs, CT;Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA;University of Connecticut, Storrs, CT

  • Venue:
  • Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapidly diminishing cost of genome sequencing is driving renewed interest in large scale genome sequencing programs such as Genome 10K (G10K). Despite renewed interest the assembly of large genomes from short reads is still an extremely resource intensive process. This work presents a scalable algorithms to create scaffolds, or ordered and oriented sets of assembled contigs, which is one part of a practical assembly. This is accomplished using integer linear programming (ILP). In order to process large mammalian genomes we employ non-serial dynamic programming (NSDP) and a hierarchical strategy. Both existing and novel quantitative metrics are used to compare scaffolding tools and gain deeper insight into the challenges of scaffolding. The code is available at: https://bitbucket.org/jrl03001/silp