Testing the accuracy of query optimizers

Authors:
Zhongxian Gu;Mohamed A. Soliman;Florian M. Waas
Affiliations:
University of California Davis;Greenplum/EMC;Greenplum/EMC
Venue:
DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems
Year:
2012

Citing 7
Cited 0

R* optimizer validation and performance evaluation for local queries

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Counting, enumerating, and sampling of execution plans in a cost-based query optimizer

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
R* Optimizer Validation and Performance Evaluation for Distributed Queries

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Identifying robust plans through plan diagram reduction

Proceedings of the VLDB Endowment
The Picasso database query optimizer visualizer

Proceedings of the VLDB Endowment
Proceedings of the Fourth International Workshop on Testing Database Systems

International Conference on Management of Data
Plan space analysis: an early warning system to detect plan regressions in cost-based optimizers

Proceedings of the Fourth International Workshop on Testing Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The accuracy of a query optimizer is intricately connected with a database system performance and its operational cost: the more accurate the optimizer's cost model, the better the resulting execution plans. Database application programmers and other practitioners have long provided anecdotal evidence that database systems differ widely with respect to the quality of their optimizers, yet, to date no formal method is available to database users to assess or refute such claims. In this paper, we develop a framework to quantify an optimizer's accuracy for a given workload. We make use of the fact that optimizers expose switches or hints that let users influence the plan choice and generate plans other than the default plan. Using these implements, we force the generation of multiple alternative plans for each test case, time the execution of all alternatives and rank the plans by their effective costs. We compare this ranking with the ranking of the estimated cost and compute a score for the accuracy of the optimizer. We present initial results of an anonymized comparisons for several major commercial database systems demonstrating that there are in fact substantial differences between systems. We also suggest ways to incorporate this knowledge into the commercial development process.