Deterministically Estimating Data Stream Frequencies

  • Authors:
  • Sumit Ganguly

  • Affiliations:
  • Indian Institute of Technology, Kanpur, India

  • Venue:
  • COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

We consider updates to an n -dimensional frequency vector of a data stream, that is, the vector f is updated coordinate-wise by means of insertions or deletions in any arbitrary order. A fundamental problem in this model is to recall the vector approximately, that is to return an estimate $\hat{f}$ of f such that where *** is an accuracy parameter and p is the index of the *** p norm used to calculate the norm . This problem, denoted by , is fundamental in data stream processing and is used to solve a number of other problems, such as heavy hitters, approximating range queries and quantiles, approximate histograms, etc.. Suppressing poly-logarithmic factors in n and , for p = 1 the problem is known to have ${\it \tilde{\Theta}}(1/\epsilon)$ randomized space complexity [2,4] and ${\it \tilde{\Theta}}(1/\epsilon^2)$ deterministic space complexity [6,7]. However, the deterministic space complexity of this problem for any value of p 1 is not known. In this paper, we show that the deterministic space complexity of the problem is ${\it \tilde{ \Theta}}(n^{2-2/p}/\epsilon^2)$ for 1 p p *** 2.