A data generator for cloud-scale benchmarking

Authors:
Tilmann Rabl;Michael Frank;Hatem Mousselly Sergieh;Harald Kosch
Affiliations:
Information Systems, University of Passau, Germany;Information Systems, University of Passau, Germany;Information Systems, University of Passau, Germany;Information Systems, University of Passau, Germany
Venue:
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Year:
2010

Citing 21
Cited 12

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
New TPC benchmarks for decision support and web commerce

ACM SIGMOD Record
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
The SDSS skyserver: public access to the sloan digital sky server data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
MUDD: a multi-dimensional data generator

WOSP '04 Proceedings of the 4th international workshop on Software and performance
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Flexible database generators

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Simple and realistic data generation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A parallel general-purpose synthetic data generator

ACM SIGMOD Record
A case for fractured mirrors

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Wake up and smell the coffee: evaluation methodology for the 21st century

Communications of the ACM - Designing games with a purpose
Paper and proposal reviews: is the process flawed?

ACM SIGMOD Record
Toward a cloud computing research agenda

ACM SIGACT News
How is the weather tomorrow?: towards a benchmark for the cloud

Proceedings of the Second International Workshop on Testing Database Systems
Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
A New Direction for TPC?

Performance Evaluation and Benchmarking
Generating Shifting Workloads to Benchmark Adaptability in Relational Database Systems

Performance Evaluation and Benchmarking
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing

Parallel data generation for performance analysis of large, complex RDBMS

Proceedings of the Fourth International Workshop on Testing Database Systems
Efficient update data generation for DBMS benchmarks

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
A PDGF implementation for TPC-H

TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Myriad: scalable and expressive data generation

Proceedings of the VLDB Endowment
Myriad: parallel data generation on shared-nothing architectures

Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Scalable test data generation from multidimensional models

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
BigBench: towards an industry standard benchmark for big data analytics

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Rapid development of data generators using meta generators in PDGF

Proceedings of the Sixth International Workshop on Testing Database Systems
Reversing statistics for scalable test databases generation

Proceedings of the Sixth International Workshop on Testing Database Systems
Variations of the star schema benchmark to test the effects of data skew on query performance

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Issues in big data testing and benchmarking

Proceedings of the Sixth International Workshop on Testing Database Systems
Generation of test databases using sampling methods

Proceedings of the 2013 International Symposium on Software Testing and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many fields of research and business data sizes are breaking the petabyte barrier. This imposes new problems and research possibilities for the database community. Usually, data of this size is stored in large clusters or clouds. Although clouds have become very popular in recent years, there is only little work on benchmarking cloud applications. In this paper we present a data generator for cloud sized applications. Its architecture makes the data generator easy to extend and to configure. A key feature is the high degree of parallelism that allows linear scaling for arbitrary numbers of nodes. We show how distributions, relationships and dependencies in data can be computed in parallel with linear speed up.