Brief Communication: Whole genome assembly from 454 sequencing output via modified DNA graph concept

  • Authors:
  • Jacek Blazewicz;Marcin Bryja;Marek Figlerowicz;Piotr Gawron;Marta Kasprzak;Edward Kirton;Darren Platt;Jakub Przybytek;Aleksandra Swiercz;Lukasz Szajkowski

  • Affiliations:
  • Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland and Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poz ...;Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland;Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland;Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland;Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland and Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poz ...;Lawrence Livermore National Laboratory, Joint Genome Institute, 7000 East Avenue, Livermore, CA 94550, USA;Lawrence Livermore National Laboratory, Joint Genome Institute, 7000 East Avenue, Livermore, CA 94550, USA;Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland;Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland and Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poz ...;Lawrence Livermore National Laboratory, Joint Genome Institute, 7000 East Avenue, Livermore, CA 94550, USA

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, 454 Life Sciences Corporation proposed a new biochemical approach to DNA sequencing (the 454 sequencing). It is based on the pyrosequencing protocol. The 454 sequencing aims to give reliable output at a low cost and in a short time. The produced sequences are shorter than reads produced by classical methods. Our paper proposes a new DNA assembly algorithm which deals well with such data and outperforms other assembly algorithms used in practice. The constructed SR-ASM algorithm is a heuristic method based on a graph model, the graph being a modified DNA graph proposed for DNA sequencing by hybridization procedure. Other new features of the assembly algorithm are, among others, temporary compression of input sequences, and a new and fast multiple alignment heuristics taking advantage of the way the output data for the 454 sequencing are presented and coded. The usefulness of the algorithm has been proved in tests on raw data generated during sequencing of the whole 1.84Mbp genome of Prochlorococcus marinus bacteria and also on a part of chromosome 15 of Homo sapiens. The source code of SR-ASM can be downloaded from http://bio.cs.put.poznan.pl/ in the section 'Current research'- 'DNA Assembly'. Among publicly available assemblers our algorithm appeared to generate the best results, especially in the number of produced contigs and in the lengths of the contigs with high similarity to the genome sequence.