Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Simple and realistic data generation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
QAGen: generating query-aware test databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A parallel general-purpose synthetic data generator
ACM SIGMOD Record
UpSizeR: Synthetically scaling an empirical relational database
Information Systems
Hi-index | 0.00 |
The use of production data which contains sensitive information in application testing requires that the production data be anonymized first. The task of anonymizing production data becomes difficult since it usually consists of constraints which must also be satisfied in the anonymized data. We propose a novel approach to anonymize constrained production data based on the concept of constraint satisfaction problems. Due to the generality of the constraint satisfaction framework, our approach can support a wide variety of mandatory integrity constraints as well as constraints which ensure the similarity of the anonymized data to the production data. Our approach decomposes the constrained anonymization problem into independent sub-problems which can be represented and solved as constraint satisfaction problems (CSPs). Since production databases may contain many records that are associated by vertical constraints, the resulting CSPs may become very large. Such CSPs are further decomposed into dependant sub-problems that are solved iteratively by applying local modifications to the production data. Simulations on synthetic production databases demonstrate the feasibility of our method.