Accurate estimation of the number of tuples satisfying a condition

  • Authors:
  • Gregory Piatetsky-Shapiro;Charles Connell

  • Affiliations:
  • New York University and Advanced Database Systems Division, Strategic Information, Burlington, Mass;Boston University and Advanced Database Systems Division, Strategic Information

  • Venue:
  • SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
  • Year:
  • 1984

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", "", "distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.