SkimpyStash: RAM space skimpy key-value store on flash-based storage

  • Authors:
  • Biplob Debnath;Sudipta Sengupta;Jin Li

  • Affiliations:
  • EMC, Santa Clara, CA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA

  • Venue:
  • Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present SkimpyStash, a RAM space skimpy key-value store on flash-based storage, designed for high throughput, low latency server applications. The distinguishing feature of SkimpyStash is the design goal of extremely low RAM footprint at about 1 (± 0.5) byte per key-value pair, which is more aggressive than earlier designs. SkimpyStash uses a hash table directory in RAM to index key-value pairs stored in a log-structured manner on flash. To break the barrier of a flash pointer (say, 4 bytes) worth of RAM overhead per key, it "moves" most of the pointers that locate each key-value pair from RAM to flash itself. This is realized by (i) resolving hash table collisions using linear chaining, where multiple keys that resolve (collide) to the same hash table bucket are chained in a linked list, and (ii) storing the linked lists on flash itself with a pointer in each hash table bucket in RAM pointing to the beginning record of the chain on flash, hence incurring multiple flash reads per lookup. Two further techniques are used to improve performance: (iii) two-choice based load balancing to reduce wide variation in bucket sizes (hence, chain lengths and associated lookup times), and a bloom filter in each hash table directory slot in RAM to disambiguate the choice during lookup, and (iv) compaction procedure to pack bucket chain records contiguously onto flash pages so as to reduce flash reads during lookup. The average bucket size is the critical design parameter that serves as a powerful knob for making a continuum of tradeoffs between low RAM usage and low lookup latencies. Our evaluations on commodity server platforms with real-world data center applications show that SkimpyStash provides throughputs from few 10,000s to upwards of 100,000 get-set operations/sec.