Don't thrash: how to cache your hash on flash

  • Authors:
  • Michael A. Bender;Martin Farach-Colton;Rob Johnson;Bradley C. Kuszmaul;Dzejla Medjedovic;Pablo Montes;Pradeep Shetty;Richard P. Spillane;Erez Zadok

  • Affiliations:
  • Stony Brook University and Tokutek;Rutgers University and Tokutek;Stony Brook University;MIT and Tokutek;Stony Brook University;Stony Brook University;Stony Brook University;Stony Brook University;Stony Brook University

  • Venue:
  • HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many large storage systems use approximate-membership-query (AMQ) data structures to deal with the massive amounts of data that they process. An AMQ data structure is a dictionary that trades off space for a false positive rate on membership queries. It is designed to fit into small, fast storage, and it is used to avoid I/Os on slow storage. The Bloom filter is a well-known example of an AMQ data structure. Bloom filters, however, do not scale outside of main memory. This paper describes the Cascade Filter, an AMQ data structure that scales beyond main memory, supporting over half a million insertions/deletions per second and over 500 lookups per second on a commodity flash-based SSD.