Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences

  • Authors:
  • Song Gao;Niranjan Nagarajan;Wing-Kin Sung

  • Affiliations:
  • NUS Graduate School for Integrative Sciences and Engineering, Singapore;Computational and Systems Biology, Genome Institute of Singapore, Singapore;Computational and Systems Biology, Genome Institute of Singapore, Singapore and School of Computing, National University of Singapore, Singapore

  • Venue:
  • RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of highquality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no gaurantees on the quality of the solution. In this work we explored the feasibility of an exact solution for scaffolding and present a first fixed-parameter tractable solution for assembly (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes.