Constrained anonymization of production data: a constraint satisfaction problem approach

Authors:
Ran Yahalom;Erez Shmueli;Tomer Zrihen
Affiliations:
Deutsche Telekom Laboratories;Deutsche Telekom Laboratories and Department of Information Systems Engineering, Ben-Gurion University, Beer Sheva, Israel;Deutsche Telekom Laboratories and Department of Information Systems Engineering, Ben-Gurion University, Beer Sheva, Israel
Venue:
SDM'10 Proceedings of the 7th VLDB conference on Secure data management
Year:
2010

Citing 6
Cited 1

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Flexible database generators

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Simple and realistic data generation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
QAGen: generating query-aware test databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A parallel general-purpose synthetic data generator

ACM SIGMOD Record

UpSizeR: Synthetically scaling an empirical relational database

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of production data which contains sensitive information in application testing requires that the production data be anonymized first. The task of anonymizing production data becomes difficult since it usually consists of constraints which must also be satisfied in the anonymized data. We propose a novel approach to anonymize constrained production data based on the concept of constraint satisfaction problems. Due to the generality of the constraint satisfaction framework, our approach can support a wide variety of mandatory integrity constraints as well as constraints which ensure the similarity of the anonymized data to the production data. Our approach decomposes the constrained anonymization problem into independent sub-problems which can be represented and solved as constraint satisfaction problems (CSPs). Since production databases may contain many records that are associated by vertical constraints, the resulting CSPs may become very large. Such CSPs are further decomposed into dependant sub-problems that are solved iteratively by applying local modifications to the production data. Simulations on synthetic production databases demonstrate the feasibility of our method.