Parallel data generation for performance analysis of large, complex RDBMS

  • Authors:
  • Tilmann Rabl;Meikel Poess

  • Affiliations:
  • Universität Passau, Innstraße, Passau;Oracle Corporation, Oracle Parkway Redwood Shores

  • Venue:
  • Proceedings of the Fourth International Workshop on Testing Database Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The exponential growth in the amount of data retained by today's systems is fostered by a recent paradigm shift towards cloud computing and the vast deployment of data-hungry applications, such as social media sites. At the same time systems are capturing more sophisticated data. Running realistic benchmarks to test the performance and robustness of these applications is becoming increasingly difficult, because of the amount of data that needs to be generated, the number of systems that need to generate the data and the complex structure of the data. These three reasons are intrinsically connected. Whenever large amounts of data are needed, its generation process needs to be highly parallel, in many cases across-systems. Since the structure of the data is becoming more and more complex, its parallel generation is extremely challenging. Over the years there have been many papers about data generators, but there has not been a comprehensive overview of the requirements of today's data generators covering the most complex problems to be solved. In this paper we present such an overview by analyzing the requirements of today's data generators and either explaining how the problems have been solved in existing data generators, or showing why the problems have not been solved yet.