Wavelet synopsis for data streams: minimizing non-euclidean error

  • Authors:
  • Sudipto Guha;Boulos Harb

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA

  • Venue:
  • Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B numbers are chosen to minimize a suitable error between the original data and the estimate derived from the synopsis.Several good one-pass wavelet construction streaming algorithms minimizing the l2 error exist. For other error measures, the problem is less understood. We provide the first one-pass small space streaming algorithms with provable error guarantees (additive approximation) for minimizing a variety of non-Euclidean error measures including all weighted lp (including l∞) and relative error lp metrics.In several previous works solutions (for weighted l2, l∞ and maximum relative error) where the B synopsis coefficients are restricted to be wavelet coefficients of the data were proposed. This restriction yields suboptimal solutions on even fairly simple examples. Other lines of research, such as probabilistic synopsis, imposed restrictions on how the synopsis was arrived at. To the best of our knowledge this paper is the first paper to address the general problem, without any restriction on how the synopsis is arrived at, as well as provide the first streaming algorithms with guaranteed performance for these classes of error measures.