Security-control methods for statistical databases: a comparative study
ACM Computing Surveys (CSUR)
Inference Control in Statistical Databases, From Theory to Practice
Model Diagnostics for Remote Access Regression Servers
Statistics and Computing
Privacy in Statistical Databases: CASC Project International Workshop, PSD 2004, Barcelona, Spain, June 9-11, 2004, Proceedings (Lecture Notes in Computer Science)
Computer Methods and Programs in Biomedicine
PSD '08 Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical Databases
Regression output from a remote analysis server
Data & Knowledge Engineering
Proceedings of the 2010 international conference on Privacy in statistical databases
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Hi-index | 0.00 |
This paper is concerned with the challenge of allowing statistical analysis of confidential business data while maintaining confidentiality. The most widely-used approach to date is statistical disclosure control, which involves modifying or confidentialising data before releasing it to users. Newer proposed approaches include the release of multiply imputed synthetic data in place of the original data, and the use of a remote analysis system enabling users to submit statistical queries and receive output without direct access to data. Most implementations of statistical disclosure control methods to date involve census or survey microdata on individual persons, because existing methods are generally acknowledged to provide inadequate confidentiality protection to business (or enterprise) data. In this paper we seek to compare the statistical disclosure control approach with the remote analysis approach, in the context of protecting the confidentiality of business data in statistical analysis. We provide an example which enables a side-by-side comparison of the outputs of exploratory data analysis and linear regression analysis conducted on a sample business dataset under these two approaches, and provide traditional unconfidentialised results as a standard for comparison. There are certainly advantages and disadvantages in the remote analysis approach and it is unlikely that remote analysis will replace statistical disclosure control methods in all applications. If the disadvantages are judged too serious in a given situation, the analyst may have to seek access to the unconfidentialised dataset. However, our example supports the conclusion that the advantages may outweigh the disadvantages in some cases, including for some analyses of unconfidentialised business data, provided the analyst is aware of the output confidentialisation methods and their potential impact.