Private data release via learning thresholds

Authors:
Moritz Hardt;Guy N. Rothblum;Rocco A. Servedio
Affiliations:
Princeton University;Microsoft Research, Silicon Valley Campus;Columbia University, and Princeton University
Venue:
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Year:
2012

Citing 22
Cited 3

A theory of the learnable

Communications of the ACM
Learning decision trees using the Fourier spectrum

SIAM Journal on Computing
On the learnability of discrete distributions

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Evaluation may be easier than generation (extended abstract)

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
An efficient membership-query algorithm for learning DNF with respect to the uniform distribution

Journal of Computer and System Sciences
Learnability beyond AC0

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Bounds for Small-Error and Zero-Error Quantum Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Learning DNF in time 2õ(n1/3)

Journal of Computer and System Sciences - STOC 2001
Learning intersections and thresholds of halfspaces

Journal of Computer and System Sciences - Special issue on FOCS 2002
Practical privacy: the SuLQ framework

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A learning theory approach to non-interactive database privacy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
What Can We Learn Privately?

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
On the complexity of differentially private data release: efficient algorithms and hardness results

Proceedings of the forty-first annual ACM symposium on Theory of computing
The Intersection of Two Halfspaces Has High Threshold Degree

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Interactive privacy via the median mechanism

Proceedings of the forty-second ACM symposium on Theory of computing
The price of privately releasing contingency tables and the spectra of random matrices with correlated rows

Proceedings of the forty-second ACM symposium on Theory of computing
Boosting and Differential Privacy

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
PCPs and the hardness of generating private synthetic data

TCC'11 Proceedings of the 8th conference on Theory of cryptography
Privately releasing conjunctions and the statistical query barrier

Proceedings of the forty-third annual ACM symposium on Theory of computing
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography

Faster algorithms for privately releasing marginals

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Answering n{2+o(1)} counting queries with differential privacy is hard

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Faster private release of marginals on small databases

Proceedings of the 5th conference on Innovations in theoretical computer science

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work considers computationally efficient privacy-preserving data release. We study the task of analyzing a database containing sensitive information about individual participants. Given a set of statistical queries on the data, we want to release approximate answers to the queries while also guaranteeing differential privacy---protecting each participant's sensitive data. Our focus is on computationally efficient data release algorithms; we seek algorithms whose running time is polynomial, or at least sub-exponential, in the data dimensionality. Our primary contribution is a computationally efficient reduction from differentially private data release for a class of counting queries, to learning thresholded sums of predicates from a related class. We instantiate this general reduction with algorithms for learning thresholds, obtaining new results for differentially private data release. As two examples, taking {0, 1}d to be the data domain (of dimension d), we obtain differentially private algorithms for: 1. Releasing all k-way conjunction counting queries (or k-way contingency tables). For any given k, the resulting data release algorithm has bounded error as long as the database is of size at least dO [EQUATION] (ignoring the dependence on other parameters). The running time is polynomial in the database size. The best sub-exponential time algorithms known prior to our work required a database of size Õ (dk/2) [Dwork McSherry Nissim and Smith 2006]. 2. Releasing any family of counting queries that is specified by a constant depth AC0 predicate. This algorithm releases accurate answers to a (1 − γ)-fraction of the queries in the family. For any γ ≥ quasipoly(1/d), the algorithm has bounded error as long as the database is of size at least quasipoly(d) (again ignoring the dependence on other parameters). The running time is quasipoly(d). The first learning algorithm uses techniques for representing thresholded sums of predicates as low-degree polynomial threshold functions. The second learning algorithm is based on a result of Jackson Klivans and Servedio [JKS 2002], and utilizes Fourier analysis of the database viewed as a function mapping queries to answers.