Myriad: scalable and expressive data generation

Authors:
Alexander Alexandrov;Kostas Tzoumas;Volker Markl
Affiliations:
Technische Universität Berlin, Germany;Technische Universität Berlin, Germany;Technische Universität Berlin, Germany
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 5
Cited 2

Simple and realistic data generation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A parallel general-purpose synthetic data generator

ACM SIGMOD Record
A data generator for cloud-scale benchmarking

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Myriad: parallel data generation on shared-nothing architectures

Proceedings of the 1st Workshop on Architectures and Systems for Big Data

Rapid development of data generators using meta generators in PDGF

Proceedings of the Sixth International Workshop on Testing Database Systems
Distributed matrix factorization with mapreduce using a series of broadcast-joins

Proceedings of the 7th ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The current research focus on Big Data systems calls for a rethinking of data generation methods. The traditional sequential data generation approach is not well suited to large-scale systems as generating a terabyte of data may require days or even weeks depending on the number of constraints imposed on the generated model. We demonstrate Myriad, a new data generation toolkit that enables the specification of semantically rich data generator programs that can scale out linearly in a shared-nothing environment. Data generation programs built on top of Myriad implement an efficient parallel execution strategy leveraged by the extensive use of pseudo-random number generators with random access support.