Cache-oblivious hashing

Authors:
Rasmus Pagh;Zhewei Wei;Ke Yi;Qin Zhang
Affiliations:
IT University of Copenhagen, Copenhagen, Denmark;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Venue:
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2010

Citing 20
Cited 2

External hashing with limited internal storage

Journal of the ACM (JACM)
Dynamic hash tables

Communications of the ACM
Linear hashing with separators—a dynamic hashing scheme achieving one-access

ACM Transactions on Database Systems (TODS)
The input/output complexity of sorting and related problems

Communications of the ACM
Randomized algorithms

Randomized algorithms
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
On the limits of cache-obliviousness

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
The Cost of Cache-Oblivious Searching

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Cuckoo hashing

Journal of Algorithms
Cache-Oblivious B-Trees

SIAM Journal on Computing
Linear probing with constant independence

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Linear hashing: a new tool for file and table addressing

VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
Why simple hash functions work: exploiting the entropy in a data stream

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Cache-oblivious databases: Limitations and opportunities

ACM Transactions on Database Systems (TODS)
Storing a sparse table with O(1) worst case access time

SFCS '82 Proceedings of the 23rd Annual Symposium on Foundations of Computer Science
Optimality in External Memory Hashing

Algorithmica
Cache-oblivious range reporting with optimal queries requires superlinear space

Proceedings of the twenty-fifth annual symposium on Computational geometry
Dynamic external hashing: the limit of buffering

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The limits of buffering: a tight lower bound for dynamic membership in the external memory model

Proceedings of the forty-second ACM symposium on Theory of computing

External-memory multimaps

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Cache-Oblivious dictionaries and multimaps with negligible failure probability

MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t_q=1+1/2Ω(b) disk accesses for any load factor α bounded away from $1$. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t_q = 1 + O(αb). Then we demonstrate that it is possible to obtain t_q = 1 + 1/2Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t_q=1+O(αb), which is exactly what linear probing achieves.