Deterministically Estimating Data Stream Frequencies

Authors:
Sumit Ganguly
Affiliations:
Indian Institute of Technology, Kanpur, India
Venue:
COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
Year:
2009

Citing 13
Cited 0

Fast, small-space algorithms for approximate histogram maintenance

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Histogramming Data Streams with Fast Per-Item Processing

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Optimal approximations of the frequency moments of data streams

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Simpler algorithm for estimating frequency moments of data streams

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Lower bounds on frequency estimation of data streams

CSR'08 Proceedings of the 3rd international conference on Computer science: theory and applications
CR-PRECIS: a deterministic summary structure for update data streams

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.02

Visualization

Abstract

We consider updates to an n -dimensional frequency vector of a data stream, that is, the vector f is updated coordinate-wise by means of insertions or deletions in any arbitrary order. A fundamental problem in this model is to recall the vector approximately, that is to return an estimate $\hat{f}$ of f such that where *** is an accuracy parameter and p is the index of the *** p norm used to calculate the norm . This problem, denoted by , is fundamental in data stream processing and is used to solve a number of other problems, such as heavy hitters, approximating range queries and quantiles, approximate histograms, etc.. Suppressing poly-logarithmic factors in n and , for p = 1 the problem is known to have ${\it \tilde{\Theta}}(1/\epsilon)$ randomized space complexity [2,4] and ${\it \tilde{\Theta}}(1/\epsilon^2)$ deterministic space complexity [6,7]. However, the deterministic space complexity of this problem for any value of p 1 is not known. In this paper, we show that the deterministic space complexity of the problem is ${\it \tilde{ \Theta}}(n^{2-2/p}/\epsilon^2)$ for 1 p p *** 2.