Statistical Disclosure Control for Microdata Using the R-Package sdcMicro

  • Authors:
  • Matthias Templ

  • Affiliations:
  • Dept of Methodology, Statistics Austria, Vienna, Austria. Dept of Statistics and Probability Theory, Vienna Univ of Technology, Vienna, Austria. e-mail: matthias.templ@statistik.gv.at/ templ@tuwie ...

  • Venue:
  • Transactions on Data Privacy
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

The demand for high quality microdata for analytical purposes has grown rapidly among researchers and the public over the last few years. In order to respect existing laws on data privacy and to be able to provide microdata to researchers and the public, statistical institutes, agencies and other institutions may provide masked data. Using our flexible software tools with which one can apply protection methods in an exploratory manner, it is possible to generate high quality confidential (micro-)data. In this paper we present highly flexible and easy to use software for the generation of anonymized microdata and give insights into the implementation and the design of the R-Package sdcMicro. R is a highly extendable system for statistical computing and graphics, distributed over the net. sdcMicro contains almost all popular methods for the anonymization of both categorical and continuous variables. Furthermore, several new methods have been implemented. The package can also be used for the comparison of methods and for measuring the information loss and disclosure risk of the masked data.