Tabulation-Based 5-Independent Hashing with Applications to Linear Probing and Second Moment Estimation

Authors:
Mikkel Thorup;Yin Zhang
Affiliations:
mthorup@research.att.com;yzhang@cs.utexas.edu
Venue:
SIAM Journal on Computing
Year:
2012

Citing 23
Cited 5

Randomized algorithms and pseudorandom numbers

Journal of the ACM (JACM)
Effective erasure codes for reliable computer communication protocols

ACM SIGCOMM Computer Communication Review
A reliable randomized algorithm for the closest-pair problem

Journal of Algorithms
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Multilevel adaptive hashing

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Even strongly universal hashing is pretty fast

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Algorithm 64: Quicksort

Communications of the ACM
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Universal Hashing and k-Wise Independent Random Variables via Integer Arithmetic without Primes

STACS '96 Proceedings of the 13th Annual Symposium on Theoretical Aspects of Computer Science
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Almost random graphs with simple hash functions

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Sketch-based change detection: methods, evaluation, and applications

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
On Universal Classes of Extremely Random Constant-Time Hash Functions

SIAM Journal on Computing
Tabulation based 4-universal hashing with applications to second moment estimation

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Linear probing with constant independence

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Why simple hash functions work: exploiting the entropy in a data stream

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
String hashing for linear probing

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Linear Probing with Constant Independence

SIAM Journal on Computing
On the k-independence required by linear probing and minwise independence

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
The power of simple tabulation hashing

Proceedings of the forty-third annual ACM symposium on Theory of computing

The Power of Simple Tabulation Hashing

Journal of the ACM (JACM)
Sketching and streaming algorithms for processing massive data

XRDS: Crossroads, The ACM Magazine for Students - Big Data
Software defined traffic measurement with OpenSketch

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Bottom-k and priority sampling, set similarity and subset sums with minimal independence

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Sparser Johnson-Lindenstrauss Transforms

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the framework of Wegman and Carter, a $k$-independent hash function maps any $k$ keys independently. It is known that 5-independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic $5$-independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain $[n]=\{0,\ldots,n-1\}$. Here we present an efficient 5-independent hash function that uses no multiplications. Instead, for any parameter $c$, we make $2c-1$ lookups in tables of size $O(n^{1/c})$. In experiments on different computers, our scheme gained factors of 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2-independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs.