Differentially private summaries for sparse data

  • Authors:
  • Graham Cormode;Cecilia Procopiuc;Divesh Srivastava;Thanh T. L. Tran

  • Affiliations:
  • AT&T Labs--Research;AT&T Labs--Research;AT&T Labs--Research;University of Massachusetts, Amherst

  • Venue:
  • Proceedings of the 15th International Conference on Database Theory
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Differential privacy is fast becoming the method of choice for releasing data under strong privacy guarantees. A standard mechanism is to add noise to the counts in contingency tables derived from the dataset. However, when the dataset is sparse in its underlying domain, this vastly increases the size of the published data, to the point of making the mechanism infeasible. We propose a general framework to overcome this problem. Our approach releases a compact summary of the noisy data with the same privacy guarantee and with similar utility. Our main result is an efficient method for computing the summary directly from the input data, without materializing the vast noisy data. We instantiate this general framework for several summarization methods. Our experiments show that this is a highly practical solution: The summaries are up to 1000 times smaller, and can be computed in less than 1% of the time compared to standard methods. Finally, our framework works with various data transformations, such as wavelets or sketches.