The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
A small approximately min-wise independent family of hash functions
Journal of Algorithms
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Tight Lower Bounds for the Distinct Elements Problem
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Optimal space lower bounds for all frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Space efficient mining of multigraph streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Profiling internet backbone traffic: behavior models and applications
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Entropy Based Worm and Anomaly Detection in Fast IP Networks
WETICE '05 Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise
Simpler algorithm for estimating frequency moments of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Streaming and sublinear approximation of entropy and information distances
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On graph problems in a semi-streaming model
Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
Data streaming algorithms for estimating entropy of network traffic
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Detecting anomalies in network traffic using maximum entropy estimation
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Estimating entropy over data streams
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Estimating entropy and entropy norm on data streams
STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
A data streaming algorithm for estimating entropies of od flows
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Declaring independence via the sketching of sketches
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Testing symmetric properties of distributions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Robust lower bounds for communication and stream computation
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Shape sensitive geometric monitoring
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching information divergences
Machine Learning
Streaming Estimation of Information-Theoretic Metrics for Anomaly Detection (Extended Abstract)
RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-optimal heavy hitters with strong error bounds
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
Sublinear estimation of entropy and information distances
ACM Transactions on Algorithms (TALG)
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Sketching information divergences
COLT'07 Proceedings of the 20th annual conference on Learning theory
Scheduling intense applications most 'surprising' first
Proceedings of the 2010 ACM Symposium on Applied Computing
A near-optimal algorithm for estimating the entropy of a stream
ACM Transactions on Algorithms (TALG)
Proceedings of the forty-second ACM symposium on Theory of computing
Information theory for data management
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Tracking long duration flows in network traffic
INFOCOM'10 Proceedings of the 29th conference on Information communications
Space-optimal heavy hitters with strong error bounds
ACM Transactions on Database Systems (TODS)
On the exact space complexity of sketching and streaming small norms
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Exponential time improvement for min-wise based algorithms
Information and Computation
An optimal lower bound on the communication complexity of gap-hamming-distance
Proceedings of the forty-third annual ACM symposium on Theory of computing
Proceedings of the forty-third annual ACM symposium on Theory of computing
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Exponential time improvement for min-wise based algorithms
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Checking and spot-checking the correctness of priority queues
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Accelerating frequent item counting with FPGA
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε-2 log(δ-1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε-2 / log(ε-1)), meaning that our algorithm is near-optimal in terms of its dependency on ε. This improves over previous work on this problem [8, 13, 17, 5]. We show that generalizing to kth order entropy requires close to linear space for all k ≥ 1, and give additive approximations using our algorithm. Lastly, we show how to compute a multiplicative approximation to the entropy of a random walk on an undirected graph.