When cycles are cheap, some tables can be huge

Authors:
Bin Fan;Dong Zhou;Hyeontaek Lim;Michael Kaminsky;David G. Andersen
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Intel Labs and Carnegie Mellon University;Carnegie Mellon University
Venue:
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Year:
2013

Citing 6
Cited 0

Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Cuckoo hashing

Journal of Algorithms
Networking named content

Proceedings of the 5th international conference on Emerging networking experiments and technologies
BUFFALO: bloom filter forwarding architecture for large organizations

Proceedings of the 5th international conference on Emerging networking experiments and technologies
SILT: a memory-efficient, high-performance key-value store

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this paper is to raise a new question: What changes in operating systems and networks if it were feasible to have a (type of) lookup table that supported billions, or hundreds of billions, of entries, using only a few bits per entry. We do so by showing that the progress of Moore's law, continuing to give more and more transistors per chip, makes it possible to apply formerly ludicrous amounts of brute-force parallel computation to find spacesavings opportunities. We make two primary observations: First, that some applications can tolerate getting an incorrect answer from the table if they query for a key that is not in the table. For these applications, we can discard the keys entirely, using storage space only for the values. Further, for some applications, the value is not arbitrary. If the range of output values is small, we can instead view the problem as one of set separation. These two observations allow us to shrink the size of the mapping by brute force searching for a "perfect mapping" from inputs to outputs that (1) does not store the input keys; and (2) avoids collisions (and thus the related storage). Our preliminary results show that we can reduce memory consumption by an order of magnitude compared to traditional hash tables while providing competitive or better lookup performance.