A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis

  • Authors:
  • Moritz Hardt;Guy N. Rothblum

  • Affiliations:
  • -;-

  • Venue:
  • FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worst-case accuracy guarantees that can answer large numbers of interactive queries and is {\em efficient} (in terms of the runtime's dependence on the data universe size). The error is asymptotically \emph{optimal} in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly {\em linear} in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for {\em any} input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a {\em smooth} distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes {\em poly-logarithmic} in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions.