Fast, small-space algorithms for approximate histogram maintenance

  • Authors:
  • Anna C. Gilbert;Sudipto Guha;Piotr Indyk;Yannis Kotidis;S. Muthukrishnan;Martin J. Strauss

  • Affiliations:
  • AT&T Labs---Research;University of Pennsylvania;MIT;AT&T Labs---Research;AT&T Labs---Research;AT&T Labs---Research

  • Venue:
  • STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

(MATH) A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ||A—H||&xie;(1+&egr;)||A—Hopt||, where the error ||A—H|| is either $\ell_1$ (absolute) or $\ell_2$ (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B,log(N),log||A,1/&egr;. Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accruacy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under $\ell_2$ error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.