Rapid development of data generators using meta generators in PDGF

Authors:
Tilmann Rabl;Meikel Poess;Manuel Danisch;Hans-Arno Jacobsen
Affiliations:
University of Toronto;Oracle Corporation, Redwood Shores, CA;University of Passau;University of Toronto
Venue:
Proceedings of the Sixth International Workshop on Testing Database Systems
Year:
2013

Citing 16
Cited 0

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
New TPC benchmarks for decision support and web commerce

ACM SIGMOD Record
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
MUDD: a multi-dimensional data generator

WOSP '04 Proceedings of the 4th international workshop on Software and performance
Flexible database generators

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Simple and realistic data generation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A parallel general-purpose synthetic data generator

ACM SIGMOD Record
Principles for an ETL Benchmark

Performance Evaluation and Benchmarking
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
A data generator for cloud-scale benchmarking

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Parallel data generation for performance analysis of large, complex RDBMS

Proceedings of the Fourth International Workshop on Testing Database Systems
Efficient update data generation for DBMS benchmarks

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
A PDGF implementation for TPC-H

TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Myriad: scalable and expressive data generation

Proceedings of the VLDB Endowment
Variations of the star schema benchmark to test the effects of data skew on query performance

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generating data sets for the performance testing of database systems on a particular hardware configuration and application domain is a very time consuming and tedious process. It is time consuming, because of the large amount of data that needs to be generated and tedious, because new data generators might need to be developed or existing once adjusted. The difficulty in generating this data is amplified by constant advances in hardware and software that allow the testing of ever larger and more complicated systems. In this paper, we present an approach for rapidly developing customized data generators. Our approach, which is based on the Parallel Data Generator Framework (PDGF), deploys a new concept of so called meta generators. Meta generators extend the concept of column-based generators in PDGF. Deploying meta generators in PDGF significantly reduces the development effort of customized data generators, it facilitates their debugging and eases their maintenance.