Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The data warehouse toolkit: practical techniques for building dimensional data warehouses
The data warehouse toolkit: practical techniques for building dimensional data warehouses
TPC-DS, taking decision support benchmarking to the next level
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Controlled SQL query evolution for decision support benchmarks
WOSP '07 Proceedings of the 6th international workshop on Software and performance
A parallel general-purpose synthetic data generator
ACM SIGMOD Record
Generating thousand benchmark queries in seconds
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Why you should run TPC-DS: a workload analysis
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Privacy Preserving Database Generation for Database Application Testing
Fundamenta Informaticae - Special issue ISMIS'05
Generating targeted queries for database testing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient and scalable statistics gathering for large databases in Oracle 11g
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Multi-RQP: generating test databases for the functional testing of OLTP applications
Proceedings of the 1st international workshop on Testing database systems
A power consumption analysis of decision support systems
Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
A framework for testing DBMS features
The VLDB Journal — The International Journal on Very Large Data Bases
A data generator for cloud-scale benchmarking
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Parallel data generation for performance analysis of large, complex RDBMS
Proceedings of the Fourth International Workshop on Testing Database Systems
Statistical database modeling for privacy preserving database generation
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Efficient update data generation for DBMS benchmarks
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
A PDGF implementation for TPC-H
TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Privacy Preserving Database Generation for Database Application Testing
Fundamenta Informaticae - Special issue ISMIS'05
Rapid development of data generators using meta generators in PDGF
Proceedings of the Sixth International Workshop on Testing Database Systems
UpSizeR: Synthetically scaling an empirical relational database
Information Systems
Hi-index | 0.00 |
Today's business intelligence systems consist of hundreds of processors with disk subsystems able to handle multiple Giga-bytes of IO-bandwidth. These systems usually contain terabytes of data. Evaluating database system performance of such systems often requires generating synthetic data with well defined statistical properties. To simulate different scenarios, it is important to vary statistical properties including row counts of tables. Foremost, in order to analyze large scale systems, data generators need to be able to produce hundreds of terabytes of data in a timely fashion. In this paper we present MUDD, a multi-dimensional data generator. Originally designed for TPC-DS, a decision support benchmark being developed by the TPC, MUDD is able to generate up to 100 Terabyte of flat file data in hours, utilizing modern multi processor architectures, including clusters. Its novel design separates data generation algorithms from data distribution definitions, enabling users to adjust their workload to individual needs and different scenarios.