Remote access methods for exploratory data analysis and statistical modelling: Privacy-Preserving Analytics®

  • Authors:
  • Ross Sparks;Chris Carter;John B. Donnelly;Christine M. O'Keefe;Jodie Duncan;Tim Keighley;Damien McAullay

  • Affiliations:
  • CSIRO Mathematical and Information Sciences3, Locked Bag 17, Herring Road, North Ryde, NSW 2113, Australia;CSIRO Mathematical and Information Sciences3, Locked Bag 17, Herring Road, North Ryde, NSW 2113, Australia;CSIRO Mathematical and Information Sciences3, Locked Bag 17, Herring Road, North Ryde, NSW 2113, Australia;CSIRO Mathematical and Information Sciences3, Locked Bag 17, Herring Road, North Ryde, NSW 2113, Australia;CSIRO Mathematical and Information Sciences3, Locked Bag 17, Herring Road, North Ryde, NSW 2113, Australia;CSIRO Mathematical and Information Sciences3, Locked Bag 17, Herring Road, North Ryde, NSW 2113, Australia;CSIRO Mathematical and Information Sciences3, Locked Bag 17, Herring Road, North Ryde, NSW 2113, Australia

  • Venue:
  • Computer Methods and Programs in Biomedicine
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper is concerned with the challenge of enabling the use of confidential or private data for research and policy analysis, while protecting confidentiality and privacy by reducing the risk of disclosure of sensitive information. Traditional solutions to the problem of reducing disclosure risk include releasing de-identified data and modifying data before release. In this paper we discuss the alternative approach of using a remote analysis server which does not enable any data release, but instead is designed to deliver useful results of user-specified statistical analyses with a low risk of disclosure. The techniques described in this paper enable a user to conduct a wide range of methods in exploratory data analysis, regression and survival analysis, while at the same time reducing the risk that the user can read or infer any individual record attribute value. We illustrate our methods with examples from biostatistics using publicly available data. We have implemented our techniques into a software demonstrator called Privacy-Preserving Analytics^(R)(PPA^(R)), via a web-based interface to the R software. We believe that PPA^(R) may provide an effective balance between the competing goals of providing useful information and reducing disclosure risk in some situations.