The power of the dinur-nissim algorithm: breaking privacy of statistical and graph databases

  • Authors:
  • Krzysztof Choromanski;Tal Malkin

  • Affiliations:
  • Columbia University, New York, NY, USA;Columbia University, New York, NY, USA

  • Venue:
  • PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A few years ago, Dinur and Nissim (PODS, 2003) proposed an algorithm for breaking database privacy when statistical queries are answered with a perturbation error of magnitude o(√n) for a database of size n. This negative result is very strong in the sense that it completely reconstructs Ω(n) data bits with an algorithm that is simple, uses random queries, and does not put any restriction on the perturbation other than its magnitude. Their algorithm works for a model where the database consists of bits, and the statistical queries asked by the adversary are sum queries for a subset of locations. In this paper we extend the attack to work for much more general settings in terms of the type of statistical query allowed, the database domain, and the general tradeoff between perturbation and privacy. Specifically, we prove: For queries of the type ∑in=1 φixi; where φ_{i} are i.i.d. and with a finite third moment and positive variance (this includes as a special case the sum queries of Dinur-Nissim and several subsequent extensions), we prove that the quadratic relation between the perturbation and what the adversary can reconstruct holds even for smaller perturbations, and even for a larger data domain. If φi is Gaussian, Poissonian, or bounded and of positive variance, this holds for arbitrary data domains and perturbation; for other φi this holds as long as the domain is not too large and the perturbation is not too small. A positive result showing that for a sum query the negative result mentioned above is tight. Specifically, we build a distribution on bit databases and an answering algorithm such that any adversary who wants to recover a little more than the negative result above allows, will not succeed except with negligible probability. We consider a richer class of summation queries, focusing on databases representing graphs, where each entry is an edge, and the query is a structural function of a subgraph. We show an attack that recovers a big portion of the graph edges, as long as the graph and the function satisfy certain properties. The attacking algorithms in both our negative results are straight-forward extensions of the Dinur-Nissim attack, based on asking φ-weighted queries or queries choosing a subgraph uniformly at random. The novelty of our work is in the analysis, showing that this simple attack is much more powerful than was previously known, as well as pointing to possible limits of this approach and putting forth new application domains such as graph problems (which may occur in social networks, Internet graphs, etc). These results may find applications not only for breaking privacy, but also in the positive direction, for recovering complicated structure information using inaccurate estimates about its substructures.