Don't thrash: how to cache your hash on flash

Authors:
Michael A. Bender;Martin Farach-Colton;Rob Johnson;Bradley C. Kuszmaul;Dzejla Medjedovic;Pablo Montes;Pradeep Shetty;Richard P. Spillane;Erez Zadok
Affiliations:
Stony Brook University and Tokutek;Rutgers University and Tokutek;Stony Brook University;MIT and Tokutek;Stony Brook University;Stony Brook University;Stony Brook University;Stony Brook University;Stony Brook University
Venue:
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Year:
2011

Citing 15
Cited 2

Optimal Semijoins for Distributed Database Systems

IEEE Transactions on Software Engineering
Design and validation of computer protocols

Design and validation of computer protocols
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Cache-oblivious streaming B-trees

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
An improved construction for counting bloom filters

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Compact Hash Tables Using Bidirectional Linear Probing

IEEE Transactions on Computers
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Counting Data Stream Based on Improved Counting Bloom Filter

WAIM '08 Proceedings of the 2008 The Ninth International Conference on Web-Age Information Management
Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell

PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
A Reconfigurable Bloom Filter Architecture for BLASTN

ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
LazyBase: freshness vs. performance in information management

ACM SIGOPS Operating Systems Review
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review

bLSM: a general purpose log structured merge tree

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
TBF: a high-efficient query mechanism in de-duplication backup system

GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many large storage systems use approximate-membership-query (AMQ) data structures to deal with the massive amounts of data that they process. An AMQ data structure is a dictionary that trades off space for a false positive rate on membership queries. It is designed to fit into small, fast storage, and it is used to avoid I/Os on slow storage. The Bloom filter is a well-known example of an AMQ data structure. Bloom filters, however, do not scale outside of main memory. This paper describes the Cascade Filter, an AMQ data structure that scales beyond main memory, supporting over half a million insertions/deletions per second and over 500 lookups per second on a commodity flash-based SSD.