Randomized algorithms and pseudorandom numbers
Journal of the ACM (JACM)
Effective erasure codes for reliable computer communication protocols
ACM SIGCOMM Computer Communication Review
A reliable randomized algorithm for the closest-pair problem
Journal of Algorithms
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Even strongly universal hashing is pretty fast
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Communications of the ACM
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Universal Hashing and k-Wise Independent Random Variables via Integer Arithmetic without Primes
STACS '96 Proceedings of the 13th Annual Symposium on Theoretical Aspects of Computer Science
Polynomial Hash Functions Are Reliable (Extended Abstract)
ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Almost random graphs with simple hash functions
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Sketch-based change detection: methods, evaluation, and applications
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
On Universal Classes of Extremely Random Constant-Time Hash Functions
SIAM Journal on Computing
Tabulation based 4-universal hashing with applications to second moment estimation
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Linear probing with constant independence
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Why simple hash functions work: exploiting the entropy in a data stream
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
String hashing for linear probing
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Linear Probing with Constant Independence
SIAM Journal on Computing
On the k-independence required by linear probing and minwise independence
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
The power of simple tabulation hashing
Proceedings of the forty-third annual ACM symposium on Theory of computing
The Power of Simple Tabulation Hashing
Journal of the ACM (JACM)
Sketching and streaming algorithms for processing massive data
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Software defined traffic measurement with OpenSketch
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Bottom-k and priority sampling, set similarity and subset sums with minimal independence
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Sparser Johnson-Lindenstrauss Transforms
Journal of the ACM (JACM)
Hi-index | 0.00 |
In the framework of Wegman and Carter, a $k$-independent hash function maps any $k$ keys independently. It is known that 5-independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic $5$-independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain $[n]=\{0,\ldots,n-1\}$. Here we present an efficient 5-independent hash function that uses no multiplications. Instead, for any parameter $c$, we make $2c-1$ lookups in tables of size $O(n^{1/c})$. In experiments on different computers, our scheme gained factors of 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2-independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs.