Accounting for Intruder Uncertainty Due to Sampling When Estimating Identification Disclosure Risks in Partially Synthetic Data

Authors:
Jörg Drechsler;Jerome P. Reiter
Affiliations:
Institute for Employment Research, Nuremberg, Germany 90478;Duke University, Durham, USA NC 27708
Venue:
PSD '08 Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical Databases
Year:
2008

Citing 0
Cited 3

Random Forests for Generating Partially Synthetic, Categorical Data

Transactions on Data Privacy
Using support vector machines for generating synthetic datasets

PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data

Transactions on Data Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially synthetic data comprise the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple draws from statistical models. Because the original records remain on the file, intruders may be able to link those records to external databases, even though values are synthesized. We illustrate how statistical agencies can evaluate the risks of identification disclosures before releasing such data. We compute risk measures when intruders know who is in the sample and when the intruders do not know who is in the sample. We use classification and regression trees to synthesize data from the U.S. Current Population Survey.