Pan-private algorithms via statistics on sketches

  • Authors:
  • Darakhshan Mir;S. Muthukrishnan;Aleksandar Nikolov;Rebecca N. Wright

  • Affiliations:
  • Rutgers University, Piscataway, NJ, USA;Rutgers University, Piscataway, NJ, USA;Rutgers University, Piscataway, NJ, USA;Rutgers University, Piscataway, NJ, USA

  • Venue:
  • Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious break-in or insider curiosity or by fiat by the government or law). We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known pan-private algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.