Stochastic Protection of Confidential Information in Databases: A Hybrid of Data Perturbation and Query Restriction

  • Authors:
  • Manuel A. Nunez;Robert S. Garfinkel;Ram D. Gopal

  • Affiliations:
  • School of Business, University of Connecticut, Storrs, Connecticut 06269;School of Business, University of Connecticut, Storrs, Connecticut 06269;School of Business, University of Connecticut, Storrs, Connecticut 06269

  • Venue:
  • Operations Research
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data perturbation and query restriction are two methods developed to protect confidential data in statistical databases. In the former, the data is systematically changed to yield answers to queries that are statistically similar to those that would have resulted from the original data. The latter provides exact answers to queries as long as the risk of exact disclosure of confidential data does not become too great. We present a new methodology to combine these techniques so that the advantages of both are captured. The model is appropriate and computationally viable for large databases whether the queries are linear or nonlinear. The query restriction phase consists of finding an optimal subset of queries to answer exactly without compromising the database. This is an N P-hard problem with a matroid intersection structure that lends itself to an efficient greedy heuristic. Then, given the queries that are answered exactly, we implement a data perturbation phase that provides stochastic protection and consistency. We present computational results on a large database with both linear and nonlinear queries. The results indicate that many queries can be answered exactly and the proposed perturbation approach provides more accurate answers than the standard perturbation method.