The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Less hashing, same performance: building a better bloom filter
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Bioinformatics
Identifying SNPs without a reference genome by comparing raw reads
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succinct data structures for assembling large genomes
Bioinformatics
Bioinformatics
Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Self-adaptive containers: building resource-efficient applications with low programmer overhead
Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
Hi-index | 0.00 |
The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB). We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.