Quantile-based bootstrap methods to generate continuous synthetic data

  • Authors:
  • Daniela Ichim

  • Affiliations:
  • Istituto Nazionale di Statistica, Rome, Italy

  • Venue:
  • Proceedings of the 2010 EDBT/ICDT Workshops
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

To face the increasing demand from users, National Statistical Institutes (NSI) release different information products. The dissemination of this information should be performed in full compliance with the regulations pertaining to the privacy of respondents. One product that could belong to a dissemination portfolio is represented by synthetic data. In this paper a very brief review of several methods to generate synthetic data is given. The emphasis is put on bootstrap methods that might be used in complex surveys. A quantile-based bootstrap method is proposed, avoiding any model assumption. Different bootstrap strategies were empirically compared from the point of view of some univariate statistics and in a linear regression framework. The Italian Structure of Earnings Survey 2006 data were used in these preliminary experiments.