Using the optimizer to generate an effective regression suite: a first step

Authors:
M. Muralikrishna
Affiliations:
Hewlett Packard, Cupertino, CA
Venue:
Proceedings of the Third International Workshop on Testing Database Systems
Year:
2010

Citing 13
Cited 1

Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Counting, enumerating, and sampling of execution plans in a cost-based query optimizer

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Massive Stochastic Testing of SQL

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Improved Unnesting Algorithms for Join Aggregate SQL Queries

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Analyzing plan diagrams of database query optimizers

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Safe Regression Test Selection Technique for Database-Driven Applications

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
The making of TPC-DS

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A framework for efficient regression tests on database applications

The VLDB Journal — The International Journal on Very Large Data Bases
Generating thousand benchmark queries in seconds

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Why you should run TPC-DS: a workload analysis

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A genetic approach for random testing of database systems

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

CODD: constructing dataless databases

DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query optimizers play a critical role in the success of every relational database system. However, regression testing for optimizers remains an ad hoc, tedious, and time consuming process. Typically, a large number of SQL query suites are employed for regression testing. These suites are manually designed at great cost by development and QA groups or are collected from various customers or benchmarks such as TPC-H or TPC-DS. While these suites are useful in capturing regressions, optimizers continue to be plagued by regressions and bug fixing requiring expensive human intervention. This may be because these ad-hoc regression queries are redundant in the sense that they are not covering different parts of the optimizer plan space. This paper introduces a novel way in which the optimizer itself is used to generate an economical regression suite. Our approach eliminates the tedium in manually designing a regression suite and removes redundancy in the suite. As a first step towards solving this very difficult problem, we shall focus on the join plan space in this paper with a small number of tables. We show that our generated queries exhibit 50% more distinct join plans than TPC-H and TPC-DS combined. The generated queries have also been very useful for validating the optimizer's cost functions and hence can be used as a test suite as well. Since this is a new approach, we will highlight some of the areas that need a closer look by the research community.