Statistical Disclosure Control for Microdata Using the R-Package sdcMicro
Transactions on Data Privacy
Disclosure risk of synthetic population data with application in the case of EU-SILC
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Hi-index | 0.00 |
The aim of this study is to evaluate the risk of re-identification related to distance-based disclosure risk measures for numerical variables. First, we overview different - already proposed - disclosure risk measures. Unfortunately, all these measures do not account for outliers. We assume that outliers must be protected more than observations near the center of the data cloud. Therefore, we propose a weighting scheme for each observation based on the concept of robust Mahalanobis distances. We also consider the peculiarities of different protection methods and adapt our measures to be able to give realistic measures for each method. In order to test our proposed distance based disclosure risk measures we run a simulation study with different amounts of data contamination. The results of the simulation study shows the usefulness of the proposed measures and gives deeper insights into how the risk of quantitative data can be measured successfully. All the methods proposed and all the protection methods plus measures used in this paper are implemented in R-package sdcMicro which is freely available on the comprehensive R archive network (http://cran.r-project.org).