Software development for SDC in r

  • Authors:
  • M. Templ

  • Affiliations:
  • Statistics Austria, Vienna

  • Venue:
  • PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The production of scientific-use files from economic microdata is a major problem. Many common methods change the data in a way which leaves the univariate distribution of each of the variables almost unchanged towards the distribution of the variables of the original data, the multivariate structure of the data, however, is often ruined. Which method are suitable strongly depends on the underlying data. A program system with which one can apply different methods and evaluate and compare results from different algorithms in a flexible way is needed. The use of methods for protecting microdata as an exploratory data analysis tool requires a powerful program system, able to present the results in a number of easy to grasp graphics. For this purpose some of the most populare procedures for anonymising micro data are applied in a flexible R-package. The R system supports flexible data import/export facilities and advanced developement tools for the development of such a software for disclosure control. Additionally to existing algorithms in other software (MDAV algorithm for microaggregation, ...) some new algorithms for anonymising microdata are implemented, e.g. a fast algorithm for microaggregation with a projection pursuit approach. This algorithm outperforms existing other algorithms for most of real data. For all this algorithms/methods print, summary and plot methods and methods for validation are implemented. In the field of economics suppression of cells in marginal tables is likely to be the most popular method to protect tables for statistical agencies. The use of linear programming for cell suppression seems to be the best way of protecting tables and hierarchical tables. Some R-packages for various fields of disclosure control are being developed at the moment. It is easy to learn the applications of disclosure control even with little previous knowledge because of its integrated online-help with examples ready to be executed.