A new approach to fragment assembly in DNA sequencing

  • Authors:
  • Pavel A. Pevzner;Haixu Tang;Michael S. Waterman

  • Affiliations:
  • Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA;Department of Mathematics, University of Southern, California, Los Angeles, CA;Department of Mathematics, University of Southern, California, Los Angeles, CA

  • Venue:
  • RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the last twenty years fragment assembly in DNA sequencing followed the “overlap - layout - consensus” paradigm that is used in all currently available assembly tools. Although this approach proved to be useful in assembling clones, it faces difficulties in genomic shotgun assembly: the existing algorithms make assembly errors and are often unable to resolve repeats even in prokaryotic genomes. Biologists are well-aware of these errors and are forced to carry additional experiments to verify the assembled contigs.We abandon the classical “overlap - layout - consensus” approach in favor of a new Eulerian Superpath approach that, for the first time, resolves the problem of repeats in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem. This reduction opens new possibilities for repeat resolution and allows one to generate error-free solutions of the large-scale fragment assembly problems. The major improvement of EULER over other algorithms is that it resolves all repeats except long perfect repeats that are theoretically impossible to resolve without additional experiments.