Disclosure risk of synthetic population data with application in the case of EU-SILC

  • Authors:
  • Matthias Templ;Andreas Alfons

  • Affiliations:
  • Department of Statistics and Probability Theory, Vienna University of Technology, Vienna, Austria and Methods Unit, Statistics Austria, Vienna, Austria;Department of Statistics and Probability Theory, Vienna University of Technology, Vienna, Austria

  • Venue:
  • PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In survey statistics, simulation studies are usually performed by repeatedly drawing samples from population data. Furthermore, population data may be used in courses on survey statistics to support the theory by practical examples. However, real population data containing the information of interest are in general not available, therefore synthetic data need to be generated. Ensuring data confidentiality is thereby absolutely essential, while the simulated data should be as realistic as possible. This paper briefly outlines a recently proposed method for generating close-to-reality population data for complex (household) surveys, which is applied to generate a population for Austrian EU-SILC (European Union Statistics on Income and Living Conditions) data. Based on this synthetic population, confidentiality issues are discussed using five different worst case scenarios. In all scenarios, the intruder has the complete information on key variables from the real survey data. It is shown that even in these worst case scenarios the synthetic population data are confidential. In addition, the synthetic data are of high quality.