Parallel data generation for performance analysis of large, complex RDBMS

Authors:
Tilmann Rabl;Meikel Poess
Affiliations:
Universität Passau, Innstraße, Passau;Oracle Corporation, Oracle Parkway Redwood Shores
Venue:
Proceedings of the Fourth International Workshop on Testing Database Systems
Year:
2011

Citing 12
Cited 4

Normal (Gaussian) random variable for supercomputers

The Journal of Supercomputing
Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
New TPC benchmarks for decision support and web commerce

ACM SIGMOD Record
MUDD: a multi-dimensional data generator

WOSP '04 Proceedings of the 4th international workshop on Software and performance
Flexible database generators

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Simple and realistic data generation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A parallel general-purpose synthetic data generator

ACM SIGMOD Record
Numerical Recipes 3rd Edition: The Art of Scientific Computing

Numerical Recipes 3rd Edition: The Art of Scientific Computing
A New Direction for TPC?

Performance Evaluation and Benchmarking
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
A data generator for cloud-scale benchmarking

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems

Efficient update data generation for DBMS benchmarks

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
CODD: constructing dataless databases

DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems
A PDGF implementation for TPC-H

TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Rapid development of data generators using meta generators in PDGF

Proceedings of the Sixth International Workshop on Testing Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The exponential growth in the amount of data retained by today's systems is fostered by a recent paradigm shift towards cloud computing and the vast deployment of data-hungry applications, such as social media sites. At the same time systems are capturing more sophisticated data. Running realistic benchmarks to test the performance and robustness of these applications is becoming increasingly difficult, because of the amount of data that needs to be generated, the number of systems that need to generate the data and the complex structure of the data. These three reasons are intrinsically connected. Whenever large amounts of data are needed, its generation process needs to be highly parallel, in many cases across-systems. Since the structure of the data is becoming more and more complex, its parallel generation is extremely challenging. Over the years there have been many papers about data generators, but there has not been a comprehensive overview of the requirements of today's data generators covering the most complex problems to be solved. In this paper we present such an overview by analyzing the requirements of today's data generators and either explaining how the problems have been solved in existing data generators, or showing why the problems have not been solved yet.