HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

Authors:
Stefan Heule;Marc Nunkesser;Alexander Hall
Affiliations:
ETH Zurich and Google, Inc.;Google, Inc.;Google, Inc.
Venue:
Proceedings of the 16th International Conference on Extending Database Technology
Year:
2013

Citing 10
Cited 0

Counting Distinct Elements in a Data Stream

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Tight Lower Bounds for the Distinct Elements Problem

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Bitmap algorithms for counting active flows on high-speed links

IEEE/ACM Transactions on Networking (TON)
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
A comparison of five probabilistic view-size estimation techniques in OLAP

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Note: Order statistics and estimating cardinalities of massive data sets

Discrete Applied Mathematics
An optimal algorithm for the distinct elements problem

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
Processing a trillion cells per mouse click

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cardinality estimation has a wide range of applications and is of particular importance in database systems. Various algorithms have been proposed in the past, and the HyperLogLog algorithm is one of them. In this paper, we present a series of improvements to this algorithm that reduce its memory requirements and significantly increase its accuracy for an important range of cardinalities. We have implemented our proposed algorithm for a system at Google and evaluated it empirically, comparing it to the original HyperLogLog algorithm. Like HyperLogLog, our improved algorithm parallelizes perfectly and computes the cardinality estimate in a single pass.