Journal of the ACM (JACM)
Assembling millions of short DNA sequences using SSAKE
Bioinformatics
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Bioinformatics
Succinct data structures for assembling large genomes
Bioinformatics
Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
IDBA: a practical iterative de bruijn graph de novo assembler
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Hi-index | 0.00 |
As genomes, transcriptomes and meta-genomes are being sequenced at a faster pace than ever, there is a pressing need for efficient genome assembly methods. Two practical issues in assembly are heavy memory usage and long execution time during the read indexing phase. In this article, a parallel and memory-efficient method is proposed for reads indexing prior to assembly. Specifically, a hash-based structure that stores a reduced amount of read information is designed. Erroneous entries are filtered on the fly during index construction. A prototype implementation has been designed and applied to actual Illumina short reads. Benchmark evaluation shows that this indexing method requires significantly less memory than those from popular assemblers.