A parallel general-purpose synthetic data generator

Authors:
Joseph E. Hoag;Craig W. Thompson
Affiliations:
University of Arkansas;University of Arkansas
Venue:
ACM SIGMOD Record
Year:
2007

Citing 5
Cited 14

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
MUDD: a multi-dimensional data generator

WOSP '04 Proceedings of the 4th international workshop on Software and performance
Flexible database generators

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Simple and realistic data generation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

A statistical and combinatorial approach to text file layout inference

Journal of Computing Sciences in Colleges
Constrained anonymization of production data: a constraint satisfaction problem approach

SDM'10 Proceedings of the 7th VLDB conference on Secure data management
A data generator for cloud-scale benchmarking

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Parallel data generation for performance analysis of large, complex RDBMS

Proceedings of the Fourth International Workshop on Testing Database Systems
SemGen: towards a semantic data generator for benchmarking duplicate detectors

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Efficient update data generation for DBMS benchmarks

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Myriad: scalable and expressive data generation

Proceedings of the VLDB Endowment
Myriad: parallel data generation on shared-nothing architectures

Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Scalable test data generation from multidimensional models

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Rapid development of data generators using meta generators in PDGF

Proceedings of the Sixth International Workshop on Testing Database Systems
Reversing statistics for scalable test databases generation

Proceedings of the Sixth International Workshop on Testing Database Systems
A taxonomy of privacy-preserving record linkage techniques

Information Systems
UpSizeR: Synthetically scaling an empirical relational database

Information Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

PSDG is a parallel synthetic data generator designed to generate "industrial sized" data sets quickly using cluster computing. PSDG depends on SDDL, a synthetic data description language that provides flexibility in the types of data we can generate.