Using the optimizer to generate an effective regression suite: a first step

  • Authors:
  • M. Muralikrishna

  • Affiliations:
  • Hewlett Packard, Cupertino, CA

  • Venue:
  • Proceedings of the Third International Workshop on Testing Database Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query optimizers play a critical role in the success of every relational database system. However, regression testing for optimizers remains an ad hoc, tedious, and time consuming process. Typically, a large number of SQL query suites are employed for regression testing. These suites are manually designed at great cost by development and QA groups or are collected from various customers or benchmarks such as TPC-H or TPC-DS. While these suites are useful in capturing regressions, optimizers continue to be plagued by regressions and bug fixing requiring expensive human intervention. This may be because these ad-hoc regression queries are redundant in the sense that they are not covering different parts of the optimizer plan space. This paper introduces a novel way in which the optimizer itself is used to generate an economical regression suite. Our approach eliminates the tedium in manually designing a regression suite and removes redundancy in the suite. As a first step towards solving this very difficult problem, we shall focus on the join plan space in this paper with a small number of tables. We show that our generated queries exhibit 50% more distinct join plans than TPC-H and TPC-DS combined. The generated queries have also been very useful for validating the optimizer's cost functions and hence can be used as a test suite as well. Since this is a new approach, we will highlight some of the areas that need a closer look by the research community.