Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Approximate counting: a detailed analysis
BIT - Ellis Horwood series in artificial intelligence
Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Computing
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
Mellin transforms and asymptotics: harmonic sums
Theoretical Computer Science - Special volume on mathematical analysis of algorithms (dedicated to D. E. Knuth)
An introduction to the analysis of algorithms
An introduction to the analysis of algorithms
Analysis of algorithms: computational methods and mathematical tools
Analysis of algorithms: computational methods and mathematical tools
q-Series Arising From The Study of Random Graphs
SIAM Journal on Discrete Mathematics
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Analytical depoissonization and its applications
Theoretical Computer Science
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Counting large numbers of events in small registers
Communications of the ACM
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Counting the number of active flows on a high speed link
ACM SIGCOMM Computer Communication Review
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Analytic Combinatorics
An algorithm for approximate counting using limited memory resources
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the 15th Symposium on International Database Engineering & Applications
STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Hi-index | 0.00 |
This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quantitative characteristics of very large data sets. The algorithms are by nature probabilistic and based on hashing. They exploit properties of simple discrete probabilistic models and their design is tightly coupled with their analysis, itself often founded on methods from analytic combinatorics. Singularly efficient solutions have been found that defy information theoretic lower bounds applicable to deterministic algorithms. Characteristics like the total number of elements, cardinality (the number of distinct elements), frequency moments, as well as unbiased samples can be gathered with little loss of information and only a small probability of failure. The algorithms are applicable to traffic monitoring in networks, to data base query optimization, and to some of the basic tasks of data mining. They apply to massive data streams and in many cases require strictly minimal auxiliary storage.