Differentially private summaries for sparse data

Authors:
Graham Cormode;Cecilia Procopiuc;Divesh Srivastava;Thanh T. L. Tran
Affiliations:
AT&T Labs--Research;AT&T Labs--Research;AT&T Labs--Research;University of Massachusetts, Amherst
Venue:
Proceedings of the 15th International Conference on Database Theory
Year:
2012

Citing 17
Cited 5

Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Estimating flow distributions from sampled flow statistics

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Priority sampling for estimation of arbitrary subset sums

Journal of the ACM (JACM)
Mechanism Design via Differential Privacy

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Universally utility-maximizing privacy mechanisms

Proceedings of the forty-first annual ACM symposium on Theory of computing
Privacy: Theory meets Practice on the Map

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Privacy integrated queries: an extensible platform for privacy-preserving data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Anonymized data: generation, models, usage

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A tale of two cities

Proceedings of the Eleventh Workshop on Mobile Computing Systems & Applications
Optimizing linear counting queries under differential privacy

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards an axiomatization of statistical privacy and utility

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Differentially private data release through multidimensional partitioning

SDM'10 Proceedings of the 7th VLDB conference on Secure data management
Boosting the accuracy of differentially private histograms through consistency

Proceedings of the VLDB Endowment
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II

Differential privacy in data publication and analysis

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Differentially private transit data publication: a case study on the montreal transportation system

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Functional mechanism: regression analysis under differential privacy

Proceedings of the VLDB Endowment
Non-interactive differential privacy: a survey

Proceedings of the First International Workshop on Open Data
Practical differential privacy via grouping and smoothing

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Differential privacy is fast becoming the method of choice for releasing data under strong privacy guarantees. A standard mechanism is to add noise to the counts in contingency tables derived from the dataset. However, when the dataset is sparse in its underlying domain, this vastly increases the size of the published data, to the point of making the mechanism infeasible. We propose a general framework to overcome this problem. Our approach releases a compact summary of the noisy data with the same privacy guarantee and with similar utility. Our main result is an efficient method for computing the summary directly from the input data, without materializing the vast noisy data. We instantiate this general framework for several summarization methods. Our experiments show that this is a highly practical solution: The summaries are up to 1000 times smaller, and can be computed in less than 1% of the time compared to standard methods. Finally, our framework works with various data transformations, such as wavelets or sketches.