The price of privacy and the limits of LP decoding

  • Authors:
  • Cynthia Dwork;Frank McSherry;Kunal Talwar

  • Affiliations:
  • Microsoft Research SVC, Mountain View, CA;Microsoft Research SVC, Mountain View, CA;Microsoft Research SVC, Mountain View, CA

  • Venue:
  • Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.08

Visualization

Abstract

This work is at theintersection of two lines of research. One line, initiated by Dinurand Nissim, investigates the price, in accuracy, of protecting privacy in a statistical database. The second, growing from an extensive literature on compressed sensing (see in particular the work of Donoho and collaborators [4,7,13,11])and explicitly connected to error-correcting codes by Candès and Tao ([4]; see also [5,3]), is in the use of linearprogramming for error correction. Our principal result is the discovery of a sharp threshhold ρ*∠ 0.239, so that if ρ A is a random m x n encoding matrix of independently chosen standardGaussians, where m = O(n), then with overwhelming probability overchoice of A, for all x ∈ Rn, LP decoding corrects ⌊ ρ m⌋ arbitrary errors in the encoding Ax, while decoding can be made to fail if the error rate exceeds ρ*. Our boundresolves an open question of Candès, Rudelson, Tao, and Vershyin [3] and (oddly, but explicably) refutesempirical conclusions of Donoho [11] and Candès et al [3]. By scaling and rounding we can easilytransform these results to obtain polynomial-time decodable random linear codes with polynomial-sized alphabets tolerating any ρ In the context of privacy-preserving datamining our results say thatany privacy mechanism, interactive or non-interactive, providingreasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.