Algorithmica
Stochastic simulation
Synchronized Disk Interleaving
IEEE Transactions on Computers
Operating systems: design and implementation
Operating systems: design and implementation
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Rdb/VMS: a comprehensive guide
Rdb/VMS: a comprehensive guide
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Limits to low-latency communication on high-speed networks
ACM Transactions on Computer Systems (TOCS)
A More Portable Fortran Random Number Generator
ACM Transactions on Mathematical Software (TOMS)
Parallel sorting on a shared-nothing architecture using probabilistic splitting
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
The Art of Computer Programming Volumes 1-3 Boxed Set
The Art of Computer Programming Volumes 1-3 Boxed Set
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
An Experiment on Response Time Scalability in Bubba
IWDM '89 Proceedings of the Sixth International Workshop on Database Machines
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach
VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Dataflow query processing using multiprocessor hash-partitioned algorithms (database, pipeline, parallelism)
Broadcast disks: data management for asymmetric communication environments
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Cubetree: organization of and bulk incremental updates on the data cube
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A framework for testing database applications
Proceedings of the 2000 ACM SIGSOFT international symposium on Software testing and analysis
Proceedings of the ninth international conference on Information and knowledge management
A near optimal algorithm for generating broadcast programs on multiple channels
Proceedings of the tenth international conference on Information and knowledge management
Proceedings of the tenth international conference on Information and knowledge management
View selection using randomized search
Data & Knowledge Engineering
Data Allocation on Wireless Broadcast Channels for Efficient Query Processing
IEEE Transactions on Computers
Optimizing Index Allocation for Sequential Data Broadcasting in Wireless Mobile Computing
IEEE Transactions on Knowledge and Data Engineering
Divide-and-Conquer Algorithm for Computing Set Containment Joins
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Generalised Hash Teams for Join and Group-by
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Shared Index Scans for Data Warehouses
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Adaptive algorithms for set containment joins
ACM Transactions on Database Systems (TODS)
The hB $^\Pi$-tree: a multi-attribute index supporting concurrency, recovery and node consolidation
The VLDB Journal — The International Journal on Very Large Data Bases
An efficient broadcast data clustering method for multipoint queries in wireless information systems
Journal of Systems and Software
Consistent database sampling as a database prototyping approach
Journal of Software Maintenance: Research and Practice
Aggregate view management in data warehouses
Handbook of massive data sets
Unified Fine-Granularity Buffering of Index and Data: Approach and Implementation
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Dynamic leveling: adaptive data broadcasting in a mobile computing environment
Mobile Networks and Applications
Caching and Scheduling for Broadcast Disk Systems
Journal of Experimental Algorithmics (JEA)
Technology for Testing Nondeterministic Client/Server Database Applications
IEEE Transactions on Software Engineering
MUDD: a multi-dimensional data generator
WOSP '04 Proceedings of the 4th international workshop on Software and performance
Using Applications of Data Versioning in Database Application Development
Proceedings of the 26th International Conference on Software Engineering
Privacy preserving database application testing
Proceedings of the 2003 ACM workshop on Privacy in the electronic society
PrefixCube: prefix-sharing condensed data cube
Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
IEEE Transactions on Knowledge and Data Engineering
Data scheduling for multi-item and transactional requests in on-demand broadcast
Proceedings of the 6th international conference on Mobile data management
The TEXTURE benchmark: measuring performance of text queries on a relational DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Parallel execution of test runs for database application systems
VLDB '05 Proceedings of the 31st international conference on Very large data bases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An Efficient Algorithm for Near Optimal Data Allocation on Multiple Broadcast Channels
Distributed and Parallel Databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Answering top-k queries using views
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Simple and realistic data generation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Bulk insertion for R-trees by seeded clustering
Data & Knowledge Engineering
A framework for efficient regression tests on database applications
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient index and data allocation for wireless broadcast services
Data & Knowledge Engineering
A parallel general-purpose synthetic data generator
ACM SIGMOD Record
Adaptive aggregation on chip multiprocessors
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Parallel buffers for chip multiprocessors
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Privacy Preserving Database Generation for Database Application Testing
Fundamenta Informaticae - Special issue ISMIS'05
Generating targeted queries for database testing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the 2008 Workshop on BEyond time and errors: novel evaLuation methods for Information Visualization
Multi-RQP: generating test databases for the functional testing of OLTP applications
Proceedings of the 1st international workshop on Testing database systems
Dwarfs in the rearview mirror: how big are they really?
Proceedings of the VLDB Endowment
Building test cases and oracles to automate the testing of web database applications
Information and Software Technology
FCLOS: A client-server architecture for mobile OLAP
Data & Knowledge Engineering
Optimal splitters for database partitioning with size bounds
Proceedings of the 12th International Conference on Database Theory
Cache-conscious buffering for database operators with state
Proceedings of the Fifth International Workshop on Data Management on New Hardware
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Consistency rationing in the cloud: pay only when it matters
Proceedings of the VLDB Endowment
Improving the performance of list intersection
Proceedings of the VLDB Endowment
An evaluation of checkpoint recovery for massively multiplayer online games
Proceedings of the VLDB Endowment
A formal framework for database sampling
Information and Software Technology
A framework for testing DBMS features
The VLDB Journal — The International Journal on Very Large Data Bases
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Automatic contention detection and amelioration for data-intensive operations
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Constrained anonymization of production data: a constraint satisfaction problem approach
SDM'10 Proceedings of the 7th VLDB conference on Secure data management
Generating databases for query workloads
Proceedings of the VLDB Endowment
A data generator for cloud-scale benchmarking
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Parallel data generation for performance analysis of large, complex RDBMS
Proceedings of the Fourth International Workshop on Testing Database Systems
Data generation using declarative constraints
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Scalable aggregation on multicore processors
Proceedings of the Seventh International Workshop on Data Management on New Hardware
UpStream: storage-centric load management for streaming applications with update semantics
The VLDB Journal — The International Journal on Very Large Data Bases
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis
Proceedings of the 7th ACM european conference on Computer Systems
Efficient update data generation for DBMS benchmarks
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Performance Evaluation of Range Queries in Key Value Stores
Journal of Grid Computing
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Reordering rows for better compression: Beyond the lexicographic order
ACM Transactions on Database Systems (TODS)
Privacy Preserving Database Generation for Database Application Testing
Fundamenta Informaticae - Special issue ISMIS'05
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Myriad: parallel data generation on shared-nothing architectures
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Scalable test data generation from multidimensional models
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Parallel analytics as a service
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
LinkBench: a database benchmark based on the Facebook social graph
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Rapid development of data generators using meta generators in PDGF
Proceedings of the Sixth International Workshop on Testing Database Systems
Reversing statistics for scalable test databases generation
Proceedings of the Sixth International Workshop on Testing Database Systems
UpSizeR: Synthetically scaling an empirical relational database
Information Systems
MICA: a holistic approach to fast in-memory key-value storage
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Evaluating database system performance often requires generating synthetic databases—ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses: (1) Parallelism to get generation speedup and scaleup. (2) Congruential generators to get dense unique uniform distributions. (3) Special-case discrete logarithms to generate indices concurrent to the base table generation. (4) Modification of (2) to get exponential, normal, and self-similar distributions.The discussion is in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors, with a thousand discs. The ideas apply to smaller databases, but large databases present the more difficult problems.