String hashing for linear probing

Authors:
Mikkel Thorup
Affiliations:
Shannon Laboratory, Florham Park, NJ
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 13
Cited 5

The analysis of closed hashing under limited randomness

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
The C programming language

The C programming language
A reliable randomized algorithm for the closest-pair problem

Journal of Algorithms
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
The C++ Programming Language

The C++ Programming Language
Universal Hashing and k-Wise Independent Random Variables via Integer Arithmetic without Primes

STACS '96 Proceedings of the 13th Annual Symposium on Theoretical Aspects of Computer Science
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Universal classes of hash functions (Extended Abstract)

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Closed Hashing is Computable and Optimally Randomizable with Universal Hash Functions

Closed Hashing is Computable and Optimally Randomizable with Universal Hash Functions
Tabulation based 4-universal hashing with applications to second moment estimation

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Cuckoo hashing

Journal of Algorithms
Linear probing with constant independence

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Why simple hash functions work: exploiting the entropy in a data stream

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms

The power of simple tabulation hashing

Proceedings of the forty-third annual ACM symposium on Theory of computing
Linear Probing with 5-wise Independence

SIAM Review
The universality of iterated hashing over variable-length strings

Discrete Applied Mathematics
The Power of Simple Tabulation Hashing

Journal of the ACM (JACM)
Tabulation-Based 5-Independent Hashing with Applications to Linear Probing and Second Moment Estimation

SIAM Journal on Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC'07, Pagh et al. presented data sets where the standard implementation of 2-universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5-universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5-universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5-universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32-bit integers, e.g., 64-bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al.