From de bruijn graphs to rectangle graphs for genome assembly

  • Authors:
  • Nikolay Vyahhi;Alex Pyshkin;Son Pham;Pavel A. Pevzner

  • Affiliations:
  • Algorithmic Biology Laboratory, St. Petersburg Academic University, Russia;Algorithmic Biology Laboratory, St. Petersburg Academic University, Russia;Department of Computer Science and Engineering, UCSD, La Jolla, CA;Algorithmic Biology Laboratory, St. Petersburg Academic University, Russia,Department of Computer Science and Engineering, UCSD, La Jolla, CA

  • Venue:
  • WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Jigsaw puzzles were originally constructed by painting a picture on a rectangular piece of wood and further cutting it into smaller pieces with a jigsaw. The Jigsaw Puzzle Problem is to find an arrangement of these pieces that fills up the rectangle in such a way that neighboring pieces have "matching" boundaries with respect to color and texture. While the general Jigsaw Puzzle Problem is NP-complete [6], we discuss its simpler version (called Rectangle Puzzle Problem) and study the rectangle graphs, recently introduced by Bankevich et al., 2012 [3], for assembling such puzzles. We establish the connection between Rectangle Puzzle Problem and the problem of assembling genomes from read-pairs, and further extend the analysis in [3] to real challenges encountered in applications of rectangle graphs in genome assembly. We demonstrate that addressing these challenges results in an assembler SPAdes+ that improves on existing assembly algorithms in the case of bacterial genomes (including particularly difficult case of genome assemblies from single cells). SPAdes+ is freely available from http://bioinf.spbau.ru/spades.