The geometry of differential privacy: the sparse and approximate cases

  • Authors:
  • Aleksandar Nikolov;Kunal Talwar;Li Zhang

  • Affiliations:
  • Rutgers University, Piscataway, NJ, USA;Microsoft Reserach SVC, Mountain View, CA, USA;Microsoft Research SVC, Mountain View, CA, USA

  • Venue:
  • Proceedings of the forty-fifth annual ACM symposium on Theory of computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries and has been the focus of a long line of work. For a given set of d linear queries over a database x ∈ RN, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, [5, 32] give an O(log2 d) approximation to the optimal mechanism. Our first contribution is to give an efficient O(log2 d) approximation guarantee for the case of (ε,δ)-differential privacy. Our mechanism adds carefully chosen correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of [44], using tools from convex geometry. We next consider the sparse case when the number of queries exceeds the number of individuals in the database, i.e. when d n Δ |x|1. The lower bounds used in the previous approximation algorithm no longer apply --- in fact better mechanisms are known in this setting [7, 27, 28, 31, 49]. Our second main contribution is to give an efficient (ε,δ)-differentially private mechanism that, for any given query set A and an upper bound n on |x|1, has mean squared error within polylog(d,N) of the optimal for A and n. This approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the l1 ball. Additionally, we show a similar polylogarithmic approximation guarantee for the optimal ε-differentially private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. A with entries in {0,1}, there is an ε-differentially private mechanism with expected error ~O(√n) per query, improving on the ~O(n2/3) bound of [7] and matching the lower bound implied by [15] up to logarithmic factors. The connection between the hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix A.