Comparison of Remote Analysis with Statistical Disclosure Control for Protecting the Confidentiality of Business Data

  • Authors:
  • Christine M. O'Keefe;Natalie Shlomo

  • Affiliations:
  • CSIRO Mathematics, Informatics and Statistics, GPO Box 664, Canberra ACT 2601 AUSTRALIA. e-mail: Christine.OKeefe@csiro.au;Southampton Statistical Sciences Research Institute, University of Southampton, Southampton SO17 1BJ UK. e-mail: N.Shlomo@soton.ac.uk

  • Venue:
  • Transactions on Data Privacy
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is concerned with the challenge of allowing statistical analysis of confidential business data while maintaining confidentiality. The most widely-used approach to date is statistical disclosure control, which involves modifying or confidentialising data before releasing it to users. Newer proposed approaches include the release of multiply imputed synthetic data in place of the original data, and the use of a remote analysis system enabling users to submit statistical queries and receive output without direct access to data. Most implementations of statistical disclosure control methods to date involve census or survey microdata on individual persons, because existing methods are generally acknowledged to provide inadequate confidentiality protection to business (or enterprise) data. In this paper we seek to compare the statistical disclosure control approach with the remote analysis approach, in the context of protecting the confidentiality of business data in statistical analysis. We provide an example which enables a side-by-side comparison of the outputs of exploratory data analysis and linear regression analysis conducted on a sample business dataset under these two approaches, and provide traditional unconfidentialised results as a standard for comparison. There are certainly advantages and disadvantages in the remote analysis approach and it is unlikely that remote analysis will replace statistical disclosure control methods in all applications. If the disadvantages are judged too serious in a given situation, the analyst may have to seek access to the unconfidentialised dataset. However, our example supports the conclusion that the advantages may outweigh the disadvantages in some cases, including for some analyses of unconfidentialised business data, provided the analyst is aware of the output confidentialisation methods and their potential impact.