Cache-oblivious streaming B-trees

Authors:
Michael A. Bender;Martin Farach-Colton;Jeremy T. Fineman;Yonatan R. Fogel;Bradley C. Kuszmaul;Jelani Nelson
Affiliations:
Stony Brook University, Stony Brook, NY;Rutgers University, Piscataway, NJ;MIT CSAIL, Cambridge, MA;Stony Brook University, Stony Brook, NY;MIT CSAIL, Cambridge, MA;MIT CSAIL, Cambridge, MA
Venue:
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2007

Citing 14
Cited 15

The input/output complexity of sorting and related problems

Communications of the ACM
Deterministic skip lists

SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
On external memory graph traversal

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Cache-oblivious priority queue and graph algorithm applications

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Design of Dynamic Data Structures

Design of Dynamic Data Structures
Cache oblivious search trees via binary trees of small height

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Lower bounds for external memory dictionaries

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cache-oblivious B-trees

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Optimal External Memory Interval Management

SIAM Journal on Computing
A locality-preserving cache-oblivious dynamic dictionary

Journal of Algorithms
Cache-oblivious string B-trees

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Spyglass: fast, scalable metadata search for large-scale storage systems

FAST '09 Proccedings of the 7th conference on File and storage technologies
Cache-oblivious polygon indecomposability testing

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Cache-oblivious dynamic dictionaries with update/query tradeoffs

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
Don't thrash: how to cache your hash on flash

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Stratified B-trees and versioned dictionaries

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
An efficient multi-tier tablet server storage architecture

Proceedings of the 2nd ACM Symposium on Cloud Computing
bLSM: a general purpose log structured merge tree

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Dynamic Indexability and the Optimality of B-Trees

Journal of the ACM (JACM)
The TokuFS streaming file system

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Don't thrash: how to cache your hash on flash

Proceedings of the VLDB Endowment
Memory-efficient groupby-aggregate using compressed buffer trees

Proceedings of the 4th annual Symposium on Cloud Computing
TABLEFS: enhancing metadata efficiency in the local file system

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Building workload-independent storage with VT-trees

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

A streaming B-tree is a dictionary that efficiently implements insertions and range queries. We present two cache-oblivious streaming B-trees, the shuttle tree, and the cache-oblivious lookahead array (COLA). For block-transfer size B and on N elements, the shuttle tree implements searches in optimal O(log B+1N) transfers, range queries of L successive elements in optimal O(log B+1N +L/B) transfers, and insertions in O((log B+1N)/BΘ(1/(log log B)2)+(log2N)/B) transfers, which is an asymptotic speedup over traditional B-trees if B ≥ (log N)1+c log log log2 N for any constant c 1. A COLA implements searches in O(log N) transfers, range queries in O(log N + L/B) transfers, and insertions in amortized O((log N)/B) transfers, matching the bounds for a (cache-aware) buffered repository tree. A partially deamortized COLA matches these bounds but reduces the worst-case insertion cost to O(log N) if memory size M = Ω(log N). We also present a cache-aware version of the COLA, the lookahead array, which achieves the same bounds as Brodal and Fagerberg's (cache-aware) Bε-tree. We compare our COLA implementation to a traditional B-tree. Our COLA implementation runs 790 times faster for random inser-tions, 3.1 times slower for insertions of sorted data, and 3.5 times slower for searches.