On the relative cost of sampling for join selectivity estimation

  • Authors:
  • Peter J. Haas;Jeffrey F. Naughton;Arun N. Swami

  • Affiliations:
  • IBM Almaden Research Center;University of Wisconsin - Madison;IBM Almaden Research Center

  • Venue:
  • PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

We compare the cost of estimating the selectivity of a “star join” using sampling procedure t-cross to the cost of simply computing the join and obtaining the exact answer. Our bounds and approximations for the relative cost of sampling show how this cost depends on the size of the input relations, the number of input relations, and the precision criterion used by the estimation procedure. We also demonstrate the deleterious effect of dangling tuples and the mixed effect of data skew on the relative cost of sampling. These results provide insight into when sampling should or should not be used for join selectivity estimation.