Counting by coin tossings

Authors:
Philippe Flajolet
Affiliations:
Algorithms Project, INRIA-Rocquencourt, Le Chesnay, France
Venue:
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Year:
2004

Citing 22
Cited 3

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Approximate counting: a detailed analysis

BIT - Ellis Horwood series in artificial intelligence
Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
On adaptive sampling

Computing
A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
Mellin transforms and asymptotics: harmonic sums

Theoretical Computer Science - Special volume on mathematical analysis of algorithms (dedicated to D. E. Knuth)
An introduction to the analysis of algorithms

An introduction to the analysis of algorithms
Analysis of algorithms: computational methods and mathematical tools

Analysis of algorithms: computational methods and mathematical tools
q-Series Arising From The Study of Random Graphs

SIAM Journal on Discrete Mathematics
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Analytical depoissonization and its applications

Theoretical Computer Science
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Counting large numbers of events in small registers

Communications of the ACM
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Counting the number of active flows on a high speed link

ACM SIGCOMM Computer Communication Review
New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Probabilistic counting

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Analytic Combinatorics

Analytic Combinatorics

An algorithm for approximate counting using limited memory resources

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Flexible approximate counting

Proceedings of the 15th Symposium on International Database Engineering & Applications
The ubiquitous digital tree

STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quantitative characteristics of very large data sets. The algorithms are by nature probabilistic and based on hashing. They exploit properties of simple discrete probabilistic models and their design is tightly coupled with their analysis, itself often founded on methods from analytic combinatorics. Singularly efficient solutions have been found that defy information theoretic lower bounds applicable to deterministic algorithms. Characteristics like the total number of elements, cardinality (the number of distinct elements), frequency moments, as well as unbiased samples can be gathered with little loss of information and only a small probability of failure. The algorithms are applicable to traffic monitoring in networks, to data base query optimization, and to some of the basic tasks of data mining. They apply to massive data streams and in many cases require strictly minimal auxiliary storage.