MUDD: a multi-dimensional data generator

Authors:
John M. Stephens;Meikel Poess
Affiliations:
Gradient Systems, Redwood City, CA;Oracle Corporation, Redwood Shores, CA
Venue:
WOSP '04 Proceedings of the 4th international workshop on Software and performance
Year:
2004

Citing 3
Cited 20

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The data warehouse toolkit: practical techniques for building dimensional data warehouses

The data warehouse toolkit: practical techniques for building dimensional data warehouses
TPC-DS, taking decision support benchmarking to the next level

Proceedings of the 2002 ACM SIGMOD international conference on Management of data

The making of TPC-DS

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Controlled SQL query evolution for decision support benchmarks

WOSP '07 Proceedings of the 6th international workshop on Software and performance
A parallel general-purpose synthetic data generator

ACM SIGMOD Record
Generating thousand benchmark queries in seconds

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Why you should run TPC-DS: a workload analysis

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Privacy Preserving Database Generation for Database Application Testing

Fundamenta Informaticae - Special issue ISMIS'05
Generating targeted queries for database testing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient and scalable statistics gathering for large databases in Oracle 11g

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Oracle database replay

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Multi-RQP: generating test databases for the functional testing of OLTP applications

Proceedings of the 1st international workshop on Testing database systems
A power consumption analysis of decision support systems

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
A framework for testing DBMS features

The VLDB Journal — The International Journal on Very Large Data Bases
A data generator for cloud-scale benchmarking

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Parallel data generation for performance analysis of large, complex RDBMS

Proceedings of the Fourth International Workshop on Testing Database Systems
Statistical database modeling for privacy preserving database generation

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Efficient update data generation for DBMS benchmarks

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
A PDGF implementation for TPC-H

TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Privacy Preserving Database Generation for Database Application Testing

Fundamenta Informaticae - Special issue ISMIS'05
Rapid development of data generators using meta generators in PDGF

Proceedings of the Sixth International Workshop on Testing Database Systems
UpSizeR: Synthetically scaling an empirical relational database

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's business intelligence systems consist of hundreds of processors with disk subsystems able to handle multiple Giga-bytes of IO-bandwidth. These systems usually contain terabytes of data. Evaluating database system performance of such systems often requires generating synthetic data with well defined statistical properties. To simulate different scenarios, it is important to vary statistical properties including row counts of tables. Foremost, in order to analyze large scale systems, data generators need to be able to produce hundreds of terabytes of data in a timely fashion. In this paper we present MUDD, a multi-dimensional data generator. Originally designed for TPC-DS, a decision support benchmark being developed by the TPC, MUDD is able to generate up to 100 Terabyte of flat file data in hours, utilizing modern multi processor architectures, including clusters. Its novel design separates data generation algorithms from data distribution definitions, enabling users to adjust their workload to individual needs and different scenarios.