Flexible database generators

Authors:
Nicolas Bruno;Surajit Chaudhuri
Affiliations:
Microsoft Corp., One Microsoft Way, Redmond, WA;Microsoft Corp., One Microsoft Way, Redmond, WA
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 13
Cited 28

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Materialized view and index selection tool for Microsoft SQL server 2000

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Benchmark Handbook: For Database and Transaction Processing Systems

Benchmark Handbook: For Database and Transaction Processing Systems
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
ToXgene: a template-based data generator for XML

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Automating Statistics Management for Query Optimizers

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Dynamic Histograms: Capturing Evolving Data Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
DB2 Advisor: An Optimizer Smart Enough to Recommend its own Indexes

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Conditional selectivity for statistics on query expressions

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Testing database applications

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Simple and realistic data generation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Generating Queries with Cardinality Constraints for DBMS Testing

IEEE Transactions on Knowledge and Data Engineering
QAGen: generating query-aware test databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A parallel general-purpose synthetic data generator

ACM SIGMOD Record
Generating targeted queries for database testing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Multi-RQP: generating test databases for the functional testing of OLTP applications

Proceedings of the 1st international workshop on Testing database systems
SVTe: a tool to validate database schemas giving explanations

Proceedings of the 1st international workshop on Testing database systems
Generating XML structure using examples and constraints

Proceedings of the VLDB Endowment
Automation of broad sanity test generation

Programming and Computing Software
Query-Aware Test Generation Using a Relational Constraint Solver

ASE '08 Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering
A framework for testing DBMS features

The VLDB Journal — The International Journal on Very Large Data Bases
Constraint-based test database generation for SQL queries

Proceedings of the 5th Workshop on Automation of Software Test
Automated SQL query generation for systematic testing of database engines

Proceedings of the IEEE/ACM international conference on Automated software engineering
Constrained anonymization of production data: a constraint satisfaction problem approach

SDM'10 Proceedings of the 7th VLDB conference on Secure data management
A data generator for cloud-scale benchmarking

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Parallel data generation for performance analysis of large, complex RDBMS

Proceedings of the Fourth International Workshop on Testing Database Systems
Data generation using declarative constraints

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SemGen: towards a semantic data generator for benchmarking duplicate detectors

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Grr: generating random RDF

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Efficient update data generation for DBMS benchmarks

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Scalable test data generation from multidimensional models

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Rapid development of data generators using meta generators in PDGF

Proceedings of the Sixth International Workshop on Testing Database Systems
Reversing statistics for scalable test databases generation

Proceedings of the Sixth International Workshop on Testing Database Systems
Generation of test databases using sampling methods

Proceedings of the 2013 International Symposium on Software Testing and Analysis
Testing a data-intensive system with generated data interactions: the norwegian customs and excise case study

CAiSE'13 Proceedings of the 25th international conference on Advanced Information Systems Engineering
UpSizeR: Synthetically scaling an empirical relational database

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluation and applicability of many database techniques, ranging from access methods, histograms, and optimization strategies to data normalization and mining, crucially depend on their ability to cope with varying data distributions in a robust way. However, comprehensive real data is often hard to come by, and there is no flexible data generation framework capable of modelling varying rich data distributions. This has led individual researchers to develop their own ad-hoc data generators for specific tasks. As a consequence, the resulting data distributions and query workloads are often hard to reproduce, analyze, and modify, thus preventing their wider usage. In this paper we present a flexible, easy to use, and scalable framework for database generation. We then discuss how to map several proposed synthetic distributions to our framework and report preliminary results.