Sampling-based estimators for subset-based queries

  • Authors:
  • Shantanu Joshi;Christopher Jermaine

  • Affiliations:
  • Server Manageability, Oracle, Redwood Shores, USA 94065;Computer and Information Science and Engineering, University of Florida, Gainesville, USA 32611

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments.