Simple and realistic data generation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A parallel general-purpose synthetic data generator
ACM SIGMOD Record
A data generator for cloud-scale benchmarking
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Myriad: parallel data generation on shared-nothing architectures
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Rapid development of data generators using meta generators in PDGF
Proceedings of the Sixth International Workshop on Testing Database Systems
Distributed matrix factorization with mapreduce using a series of broadcast-joins
Proceedings of the 7th ACM conference on Recommender systems
Hi-index | 0.00 |
The current research focus on Big Data systems calls for a rethinking of data generation methods. The traditional sequential data generation approach is not well suited to large-scale systems as generating a terabyte of data may require days or even weeks depending on the number of constraints imposed on the generated model. We demonstrate Myriad, a new data generation toolkit that enables the specification of semantically rich data generator programs that can scale out linearly in a shared-nothing environment. Data generation programs built on top of Myriad implement an efficient parallel execution strategy leveraged by the extensive use of pseudo-random number generators with random access support.