Space-efficient and exact de bruijn graph representation based on a bloom filter

Authors:
Rayan Chikhi;Guillaume Rizk
Affiliations:
Computer Science Department, ENS Cachan/IRISA, Rennes, France;Algorizk, Paris, France
Venue:
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Year:
2012

Citing 8
Cited 1

The Bloomier filter: an efficient data structure for static support lookup tables

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Less hashing, same performance: building a better bloom filter

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
GASSST

Bioinformatics
Identifying SNPs without a reference genome by comparing raw reads

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succinct data structures for assembling large genomes

Bioinformatics
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers

Bioinformatics
Meta-IDBA

Bioinformatics
Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics

Self-adaptive containers: building resource-efficient applications with low programmer overhead

Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB). We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.