Quantile-based bootstrap methods to generate continuous synthetic data

Authors:
Daniela Ichim
Affiliations:
Istituto Nazionale di Statistica, Rome, Italy
Venue:
Proceedings of the 2010 EDBT/ICDT Workshops
Year:
2010

Citing 4
Cited 0

A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique

Inference Control in Statistical Databases, From Theory to Practice
Information preserving statistical obfuscation

Statistics and Computing
Data ShufflingA New Masking Approach for Numerical Data

Management Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

To face the increasing demand from users, National Statistical Institutes (NSI) release different information products. The dissemination of this information should be performed in full compliance with the regulations pertaining to the privacy of respondents. One product that could belong to a dissemination portfolio is represented by synthetic data. In this paper a very brief review of several methods to generate synthetic data is given. The emphasis is put on bootstrap methods that might be used in complex surveys. A quantile-based bootstrap method is proposed, avoiding any model assumption. Different bootstrap strategies were empirically compared from the point of view of some univariate statistics and in a linear regression framework. The Italian Structure of Earnings Survey 2006 data were used in these preliminary experiments.