BloomFlash: Bloom Filter on Flash-Based Storage

  • Authors:
  • Biplob Debnath;Sudipta Sengupta;Jin Li;David J. Lilja;David H. C. Du

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The bloom filter is a probabilistic data structure that provides a compact representation of a set of elements. To keep false positive probabilities low, the size of the bloom filter must be dimensioned a priori to be linear in the maximum number of keys inserted, with the linearity constant ranging typically from one to few bytes. A bloom filter is most commonly used as an in memory data structure, hence its size is limited by the availability of RAM space on the machine. As datasets have grown over time to Internet scale, so have the RAM space requirements of bloom filters. If sufficient RAM space is not available, we advocate that flash memory may serve as a suitable medium for storing bloom filters, since it is about one-tenth the cost of RAM per GB while still providing access times orders of magnitude faster than hard disk. We present BLOOMFLASH, a bloom filter designed for flash memory based storage, that provides a new dimension of trade off with bloom filter access times to reduce RAM space usage (and hence system cost). The simple design of a single flat bloom filter on flash suffers from many performance bottlenecks, including in-place bit updates that are inefficient on flash and multiple reads and random writes spread out across many flash pages for a single lookup or insert operation. To mitigate these performance bottlenecks, BLOOMFLASH leverages two key design innovations: (i) buffering bit updates in RAM and applying them in bulk to flash that helps to reduce random writes to flash, and (ii) a hierarchical bloom filter design consisting of component bloom filters, stored one per flash page, that helps to localize reads and writes on flash. We use two real-world data traces taken from representative bloom filter applications to drive and evaluate our design. BLOOMFLASH achieves bloom filter access times in the range of few tens of microseconds, thus allowing up to order of tens of thousands operations per sec.