Targeted genetic test SQL generation for the DB2 database

  • Authors:
  • Dominic Letarte;Francois Gauthier;Ettore Merlo;Nattavut Sutyanyong;Calisto Zuzarte

  • Affiliations:
  • École Polytechnique de Montréal;École Polytechnique de Montréal;École Polytechnique de Montréal;IBM Canada Ltd.;IBM Canada Ltd.

  • Venue:
  • DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic Query generators have been shown to be effective tools for software testing. For the most part, they have been used in system testing for the database as a whole or to generate specific queries to test specific features with not much randomness. In this work we explore the problems encountered when using a genetic algorithm to generate SQL for testing a large database system. General random SQL generation that tests the database system as a whole using genetic algorithms is relatively simple. One would need to generate millions of test cases to have a reasonable chance of hitting specific combinations of features. In order to optimize the testing, one needs to generate targeted SQL queries that narrow the testing to specific feature areas and feature combinations but yet preserve a certain amount of randomness and exploit the strength of a genetic algorithm. To do this effectively, the test generator needs to be guided so that it does not stray too much from the goals of the more targeted test requirement. In this work we explore a genetic algorithm approach to generate test queries that exercise target sub-sequences of features. Genetic algorithm parameters such as genome representation, reproduction, fitness evaluation, and selection are described. Preliminary results obtained comparing the presented approach with a random query generator are presented and discussed. We further present the DB2 SQL Query Optimizer, the application which we are using as a case study and target queries that go through certain optimization rule sequences. This application is larger and more complex in terms of code size and data input complexity then software previously used for studying test data generation.