Journal of the ACM (JACM)
Large-scale Genome Sequence Processing
Large-scale Genome Sequence Processing
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
Succinct data structures for assembling large genomes
Bioinformatics
Hi-index | 0.00 |
We propose a new succinct de Bruijn graph representation. If the de Bruijn graph of k-mers in a DNA sequence of length N has m edges, it can be represented in 4m+o(m) bits. This is much smaller than existing ones. The numbers of outgoing and incoming edges of a node are computed in constant time, and the outgoing and incoming edge with given label are found in constant time and $\mathcal{O}(k)$ time, respectively. The data structure is constructed in $\mathcal{O}(Nk \log m/\log\log m)$ time using no additional space.