Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ISOMER: Consistent Histogram Construction Using Query Feedback
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Simple and realistic data generation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Generating Queries with Cardinality Constraints for DBMS Testing
IEEE Transactions on Knowledge and Data Engineering
L-diversity: Privacy beyond k-anonymity
ACM Transactions on Knowledge Discovery from Data (TKDD)
QAGen: generating query-aware test databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Learning Factor Graphs in Polynomial Time and Sample Complexity
The Journal of Machine Learning Research
Generating XML structure using examples and constraints
Proceedings of the VLDB Endowment
Proceedings of the forty-first annual ACM symposium on Theory of computing
Generating example data for dataflow programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Combinatorial Optimization: Theory and Algorithms
Combinatorial Optimization: Theory and Algorithms
Understanding cardinality estimation using entropy maximization
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Generating databases for query workloads
Proceedings of the VLDB Endowment
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Tiresias: the database oracle for how-to queries
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Scalable test data generation from multidimensional models
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Reversing statistics for scalable test databases generation
Proceedings of the Sixth International Workshop on Testing Database Systems
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
Hi-index | 0.00 |
We study the problem of generating synthetic databases having declaratively specified characteristics. This problem is motivated by database system and application testing, data masking, and benchmarking. While the data generation problem has been studied before, prior approaches are either non-declarative or have fundamental limitations relating to data characteristics that they can capture and efficiently support. We argue that a natural, expressive, and declarative mechanism for specifying data characteristics is through cardinality constraints; a cardinality constraint specifies that the output of a query over the generated database have a certain cardinality. While the data generation problem is intractable in general, we present efficient algorithms that can handle a large and useful class of constraints. We include a thorough empirical evaluation illustrating that our algorithms handle complex constraints, scale well as the number of constraints increase, and outperform applicable prior techniques.