Disclosure risk of synthetic population data with application in the case of EU-SILC

Authors:
Matthias Templ;Andreas Alfons
Affiliations:
Department of Statistics and Probability Theory, Vienna University of Technology, Vienna, Austria and Methods Unit, Statistics Austria, Vienna, Austria;Department of Statistics and Probability Theory, Vienna University of Technology, Vienna, Austria
Venue:
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Year:
2010

Citing 5
Cited 1

Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Robust Statistics Meets SDC: New Disclosure Risk Measures for Continuous Microdata Masking

PSD '08 Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical Databases
Comparing Fully and Partially Synthetic Datasets for Statistical Disclosure Control in the German IAB Establishment Panel

Transactions on Data Privacy
A generalized negative binomial smoothing model for sample disclosure risk estimation

PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases

Testing of IHSN c++ code and inclusion of new methods into sdcmicro

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In survey statistics, simulation studies are usually performed by repeatedly drawing samples from population data. Furthermore, population data may be used in courses on survey statistics to support the theory by practical examples. However, real population data containing the information of interest are in general not available, therefore synthetic data need to be generated. Ensuring data confidentiality is thereby absolutely essential, while the simulated data should be as realistic as possible. This paper briefly outlines a recently proposed method for generating close-to-reality population data for complex (household) surveys, which is applied to generate a population for Austrian EU-SILC (European Union Statistics on Income and Living Conditions) data. Based on this synthetic population, confidentiality issues are discussed using five different worst case scenarios. In all scenarios, the intruder has the complete information on key variables from the real survey data. It is shown that even in these worst case scenarios the synthetic population data are confidential. In addition, the synthetic data are of high quality.