Adaptive query processing: dealing with incomplete and uncertain statistics

  • Authors:
  • David J. Dewitt;Pedro G. Bizarro

  • Affiliations:
  • The University of Wisconsin - Madison;The University of Wisconsin - Madison

  • Venue:
  • Adaptive query processing: dealing with incomplete and uncertain statistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The standard Database Management Systems (DBMS) query processing model picks a single non-adaptive plan and executes it to completion. The chosen plan aims to minimize running time by carefully optimizing the use of secondary storage, memory, and CPU. DBMS optimizers estimate plan costs by using statistics---information describing the datasets, the queries, and the system. When statistics needed to cost plans are not available in the database catalog, the optimizer estimates them by using heuristics, by assuming that some data distributions are uniform or independent, and by using a combination of other possibly estimated statistics or default values. These estimates may contain errors and these errors grow exponentially with the number of estimated statistics derived from other estimated statistics. This may lead to selecting a query plan that is sub-optimal by several orders of magnitude. Having more information in the catalog (e.g., histograms) reduces the problem but does not scale with the number of relations and attributes in the database. In addition, several hardware and software trends are making this hard problem harder. For example, the optimization space is increasing exponentially because there are more operators to consider, larger datasets to manage, and more complex queries to optimize. Thus, optimizers are increasingly more likely to select sub-optimal plans.In the general case, DBMS optimizers may have insufficient information to choose a single, good, non-adaptive query plan. Instead of focusing on providing more information to the optimizer, we propose several Adaptive Query Processing (AQP) techniques as alternatives or extensions to the non-adaptive architecture employed by today's commercial database systems. Our proposals are targeted to: (i) correct or avoid query processing problems due to the use of incorrect and partial information at optimization time and (ii) collect information not available at optimization time and dynamically determine and assign different plans for different subsets of the data. The work presented here complements, extends, or supersedes previous AQP proposals.