The power of the dinur-nissim algorithm: breaking privacy of statistical and graph databases

Authors:
Krzysztof Choromanski;Tal Malkin
Affiliations:
Columbia University, New York, NY, USA;Columbia University, New York, NY, USA
Venue:
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Year:
2012

Citing 21
Cited 0

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Revealing information while preserving privacy

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Simulatable auditing

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards robustness in query auditing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The price of privacy and the limits of LP decoding

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
New Efficient Attacks on Statistical Disclosure Control Mechanisms

CRYPTO 2008 Proceedings of the 28th Annual conference on Cryptology: Advances in Cryptology
The Differential Privacy Frontier (Extended Abstract)

TCC '09 Proceedings of the 6th Theory of Cryptography Conference on Theory of Cryptography
Differentially private recommender systems: building privacy into the net

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Private information retrieval

Communications of the ACM
Achieving anonymity via clustering

ACM Transactions on Algorithms (TALG)
On the geometry of differential privacy

Proceedings of the forty-second ACM symposium on Theory of computing
Differential privacy under continual observation

Proceedings of the forty-second ACM symposium on Theory of computing
The price of privately releasing contingency tables and the spectra of random matrices with correlated rows

Proceedings of the forty-second ACM symposium on Theory of computing
Differentially-private network trace analysis

Proceedings of the ACM SIGCOMM 2010 conference
Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Differential privacy in new settings

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Differentially private combinatorial optimization

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Ask a better question, get a better answer a new approach to private data analysis

ICDT'07 Proceedings of the 11th international conference on Database Theory
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Our data, ourselves: privacy via distributed noise generation

EUROCRYPT'06 Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography

Quantified Score

Hi-index	0.00

Visualization

Abstract

A few years ago, Dinur and Nissim (PODS, 2003) proposed an algorithm for breaking database privacy when statistical queries are answered with a perturbation error of magnitude o(√n) for a database of size n. This negative result is very strong in the sense that it completely reconstructs Ω(n) data bits with an algorithm that is simple, uses random queries, and does not put any restriction on the perturbation other than its magnitude. Their algorithm works for a model where the database consists of bits, and the statistical queries asked by the adversary are sum queries for a subset of locations. In this paper we extend the attack to work for much more general settings in terms of the type of statistical query allowed, the database domain, and the general tradeoff between perturbation and privacy. Specifically, we prove: For queries of the type ∑in=1 φixi; where φ_{i} are i.i.d. and with a finite third moment and positive variance (this includes as a special case the sum queries of Dinur-Nissim and several subsequent extensions), we prove that the quadratic relation between the perturbation and what the adversary can reconstruct holds even for smaller perturbations, and even for a larger data domain. If φi is Gaussian, Poissonian, or bounded and of positive variance, this holds for arbitrary data domains and perturbation; for other φi this holds as long as the domain is not too large and the perturbation is not too small. A positive result showing that for a sum query the negative result mentioned above is tight. Specifically, we build a distribution on bit databases and an answering algorithm such that any adversary who wants to recover a little more than the negative result above allows, will not succeed except with negligible probability. We consider a richer class of summation queries, focusing on databases representing graphs, where each entry is an edge, and the query is a structural function of a subgraph. We show an attack that recovers a big portion of the graph edges, as long as the graph and the function satisfy certain properties. The attacking algorithms in both our negative results are straight-forward extensions of the Dinur-Nissim attack, based on asking φ-weighted queries or queries choosing a subgraph uniformly at random. The novelty of our work is in the analysis, showing that this simple attack is much more powerful than was previously known, as well as pointing to possible limits of this approach and putting forth new application domains such as graph problems (which may occur in social networks, Internet graphs, etc). These results may find applications not only for breaking privacy, but also in the positive direction, for recovering complicated structure information using inaccurate estimates about its substructures.