Parallel and memory-efficient reads indexing for genome assembly

Authors:
Guillaume Chapuis;Rayan Chikhi;Dominique Lavenier
Affiliations:
Computer Science Department, ENS Cachan/IRISA, Rennes, France;Computer Science Department, ENS Cachan/IRISA, Rennes, France;Computer Science Department, ENS Cachan/IRISA, Rennes, France
Venue:
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Year:
2011

Citing 8
Cited 0

Indexing compressed text

Journal of the ACM (JACM)
Assembling millions of short DNA sequences using SSAKE

Bioinformatics
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
Efficient construction of an assembly string graph using the FM-index

Bioinformatics
PE-Assembler

Bioinformatics
Succinct data structures for assembling large genomes

Bioinformatics
Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
IDBA: a practical iterative de bruijn graph de novo assembler

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

As genomes, transcriptomes and meta-genomes are being sequenced at a faster pace than ever, there is a pressing need for efficient genome assembly methods. Two practical issues in assembly are heavy memory usage and long execution time during the read indexing phase. In this article, a parallel and memory-efficient method is proposed for reads indexing prior to assembly. Specifically, a hash-based structure that stores a reduced amount of read information is designed. Erroneous entries are filtered on the fly during index construction. A prototype implementation has been designed and applied to actual Illumina short reads. Benchmark evaluation shows that this indexing method requires significantly less memory than those from popular assemblers.