Workload-optimal histograms on streams

  • Authors:
  • S. Muthukrishnan;M. Strauss;X. Zheng

  • Affiliations:
  • Supported by NSF ITR 0220280 and NSF 0354600, Rutgers University;Supported by NSF DMS 0354600, University of Michigan;Supported by NSF DMS 0354600, University of Michigan

  • Venue:
  • ESA'05 Proceedings of the 13th annual European conference on Algorithms
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A histogram is a piecewise-constant approximation of an observed data distribution. A histogram is used as a small-space, approximate synopsis of the underlying data distribution, which is often too large to be stored precisely. Histograms have found many applications in database management systems, perhaps most commonly for query selectivity estimation in query optimizers [1], but have also found applications in approximate query answering [2], load balancing in parallel join execution [3], mining time-series data [4], partition-based temporal join execution, query pro.ling for user feedback, etc. Ioannidis has a nice overview of the history of histograms, their applications, and their use in commercial DBMSs [5]. Also, Poosala’s thesis provides a systematic treatment of different types of histograms [3].