Parallel Implementation of the Novel Approach to Genome Assembly

  • Authors:
  • Jacek Blazewicz;Marta Kasprzak;Aleksandra Swiercz;Marek Figlerowicz;Piotr Gawron;Darren Platt;Lukasz Szajkowski

  • Affiliations:
  • -;-;-;-;-;-;-

  • Venue:
  • SNPD '08 Proceedings of the 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

DNA assembly problem is well known for its high complexity both on biological and computational levels. Traditional laboratory approach to the problem, which involves DNA sequencing by hybridization or by gel electrophoresis, entails a lot of errors coming from experimental and algorithmic stages. DNA sequences constituting the traditional assembly input have lengths about a few hundreds of nucleotides and they cover each other rather sparsely. A new biochemical approach to DNA sequencing, proposed recently, gives highly reliable output of relatively lowcost and in short time. It is 454 sequencing, based on the pyrosequencing protocol, owned by 454 Life Sciences Corporation. The produced sequences are shorter (about 100-200 nucleotides) but their coverage in the assembled sequence is very dense. In the paper, we proposea parallel implementation of an algorithm dealing well with such data and outperforming other assembly algorithms used in practice.The algorithm is a heuristic based on a graph model, the graph being built on the set of input sequences. Computational tests we reperformed on real data obtained from the 454 sequencer during sequencing the genome of bacteria Prochlorococcus marinus.