Domains and Active Domains: What This Distinction Implies for the Estimation of Projection Sizes in Relational Databases

  • Authors:
  • Paolo Ciaccia;Dario Maio

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database optimizers require statistical information about data distributions in order to evaluate result sizes and access plan costs for processing user queries. In this context, we consider the problem of estimating the size of the projections of a database relation, when measures on attribute domain cardinalities are maintained in the system. Our main theoretical contribution is a new formal model (AD), valid under the hypotheses of attribute independence and uniform distribution of attribute values, derived considering the difference between time-invariant domain (the set of values that an attribute can assume) and time-dependent 驴active domain驴 (the set of values that are actually assumed, at a certain time). Early models developed under the same assumptions are shown to be formally incorrect. Since the AD model is computationally high-demanding, we also introduce an approximate, easy-to-compute model (A2D) that, unlike previous approximations, yields low errors on all the parameter space of the active domain cardinalities. Finally, we extend the A2D model to the case of nonuniform distributions and present experimental results confirming the good behavior of the model.