On Estimating Frequency Moments of Data Streams

  • Authors:
  • Sumit Ganguly;Graham Cormode

  • Affiliations:
  • Indian Institute of Technology, Kanpur,;AT&T Labs---Research,

  • Venue:
  • APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Space-economical estimation of the pth frequency moments, defined as , for p 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine learning, and in data stream computation. Random sketches formed by the inner product of the frequency vector f1, ..., fnwith a suitably chosen random vector were pioneered by Alon, Matias and Szegedy [1], and have since played a central role in estimating Fpand for data stream computations in general. The concept of p-stable sketches formed by the inner product of the frequency vector with a random vector whose components are drawn from a p-stable distribution, was proposed by Indyk for estimating Fp, for 0 pIn this paper, we consider the problem of estimating Fp, for 0 pFp, for 0 psketch [7] and the structure [5]. Our algorithms require space $\tilde{O}(\frac{1}{\epsilon^{2+p}})$ to estimate Fpto within 1 ±茂戮驴factors and requires expected time $O(\log F_1 \log \frac{1}{\delta})$ to process each update. Thus, our technique trades an $O(\frac{1}{\epsilon^p})$ factor in space for much more efficient processing of stream updates. We also present a stand-alone iterative estimator for F1.