Optimal and approximate computation of summary statistics for range aggregates

  • Authors:
  • Anna C. Gilbert;Yannis Kotidis;S. Muthukrishnan;Marin J. Strauss

  • Affiliations:
  • AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ

  • Venue:
  • PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on “selectivity estimation”, that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries.First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries).Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property.We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.