Randomized algorithms
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Communication complexity
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
An improved data stream algorithm for frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Profiling internet backbone traffic: behavior models and applications
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Entropy Based Worm and Anomaly Detection in Fast IP Networks
WETICE '05 Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise
Streaming and sublinear approximation of entropy and information distances
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Detecting anomalies in network traffic using maximum entropy estimation
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Data streaming algorithms for estimating entropy of network traffic
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Estimating entropy over data streams
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A near-optimal algorithm for computing the entropy of a stream
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A data streaming algorithm for estimating entropies of od flows
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Online pairing of VoIP conversations
The VLDB Journal — The International Journal on Very Large Data Bases
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sublinear estimation of entropy and information distances
ACM Transactions on Algorithms (TALG)
A near-optimal algorithm for estimating the entropy of a stream
ACM Transactions on Algorithms (TALG)
Proceedings of the forty-second ACM symposium on Theory of computing
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Space-efficient tracking of persistent items in a massive data stream
Proceedings of the 5th ACM international conference on Distributed event-based system
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Testing Closeness of Discrete Distributions
Journal of the ACM (JACM)
Hi-index | 0.00 |
We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the “entropy norm” of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second group of results are for estimating the empirical entropy of an input stream. We first present a sublinear space one-pass algorithm for this problem. For a stream of m items and a given real parameter α, our algorithm uses space $\tilde{O}(m^{2\alpha})$ and provides an approximation of 1/α in the worst case and (1+ε) in “most” cases. We then present a two-pass polylogarithmic space (1+ε)-approximation algorithm. All our algorithms are quite simple.