External hashing with limited internal storage
Journal of the ACM (JACM)
Communications of the ACM
Linear hashing with separators—a dynamic hashing scheme achieving one-access
ACM Transactions on Database Systems (TODS)
The input/output complexity of sorting and related problems
Communications of the ACM
Randomized algorithms
Extendible hashing—a fast access method for dynamic files
ACM Transactions on Database Systems (TODS)
On the limits of cache-obliviousness
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
The Cost of Cache-Oblivious Searching
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Journal of Algorithms
SIAM Journal on Computing
Linear probing with constant independence
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Linear hashing: a new tool for file and table addressing
VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
Why simple hash functions work: exploiting the entropy in a data stream
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Cache-oblivious databases: Limitations and opportunities
ACM Transactions on Database Systems (TODS)
Storing a sparse table with O(1) worst case access time
SFCS '82 Proceedings of the 23rd Annual Symposium on Foundations of Computer Science
Optimality in External Memory Hashing
Algorithmica
Cache-oblivious range reporting with optimal queries requires superlinear space
Proceedings of the twenty-fifth annual symposium on Computational geometry
Dynamic external hashing: the limit of buffering
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The limits of buffering: a tight lower bound for dynamic membership in the external memory model
Proceedings of the forty-second ACM symposium on Theory of computing
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Cache-Oblivious dictionaries and multimaps with negligible failure probability
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Hi-index | 0.00 |
The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t_q=1+1/2Ω(b) disk accesses for any load factor α bounded away from $1$. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t_q = 1 + O(αb). Then we demonstrate that it is possible to obtain t_q = 1 + 1/2Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t_q=1+O(αb), which is exactly what linear probing achieves.