The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Properties and prediction of flow statistics from sampled packet streams
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Sampling lower bounds via information theory
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Estimating flow distributions from sampled flow statistics
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
The complexity of massive data set computations
The complexity of massive data set computations
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Estimating statistical aggregates on probabilistic data streams
ACM Transactions on Database Systems (TODS)
Sketching and Streaming Entropy via Approximation Theory
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Sketching Sampled Data Streams
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
How to scalably and accurately skip past streams
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Revisiting the Direct Sum Theorem and Space Lower Bounds in Random Order Streams
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
On the exact space complexity of sketching and streaming small norms
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Indexing for summary queries: Theory and practice
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sample a small fraction of the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, the quantities that need to be computed on the sampled stream are often different from the original quantities of interest and their estimation requires new algorithms. We present upper and lower bounds (often matching) for estimating frequency moments, support size, entropy, and heavy hitters of the original stream from the data observed in the sampled stream.