Data-streams and histograms

  • Authors:
  • Sudipto Guha;Nick Koudas;Kyuseok Shim

  • Affiliations:
  • AT&T Research;AT&T Research;Computer Science Department and AITRC, KAIST

  • Venue:
  • STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these histograms exist, albeit running in quadratic time and linear space. In this paper we provide linear time construction of 1 + &egr; approximation of optimal histograms, running in polylogarithmic space.Our results extend to the context of data-streams, and in fact generalize to give 1 + &egr; approximation of several problems in data-streams which require partitioning the index set into intervals. The only assumptions required are that the cost of an interval is monotonic under inclusion (larger interval has larger cost) and that the cost can be computed or approximated in small space. This exhibits a nice class of problems for which we can have near optimal data-stream algorithms.