Highly efficient parallel approach to the next-generation DNA sequencing
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Hi-index | 0.00 |
DNA assembly problem is well known for its high complexity both on biological and computational levels. Traditional laboratory approach to the problem, which involves DNA sequencing by hybridization or by gel electrophoresis, entails a lot of errors coming from experimental and algorithmic stages. DNA sequences constituting the traditional assembly input have lengths about a few hundreds of nucleotides and they cover each other rather sparsely. A new biochemical approach to DNA sequencing, proposed recently, gives highly reliable output of relatively lowcost and in short time. It is 454 sequencing, based on the pyrosequencing protocol, owned by 454 Life Sciences Corporation. The produced sequences are shorter (about 100-200 nucleotides) but their coverage in the assembled sequence is very dense. In the paper, we proposea parallel implementation of an algorithm dealing well with such data and outperforming other assembly algorithms used in practice.The algorithm is a heuristic based on a graph model, the graph being built on the set of input sequences. Computational tests we reperformed on real data obtained from the 454 sequencer during sequencing the genome of bacteria Prochlorococcus marinus.