UMicS: from anonymized data to usable microdata

Authors:
Graham Cormode;Entong Shen;Xi Gong;Ting Yu;Cecilia M. Procopiuc;Divesh Srivastava
Affiliations:
University of Warwick, Coventry, United Kingdom;North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;AT&T Labs - Research, Florham Park, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 18
Cited 0

Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Minimality attack in privacy preserving data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Universally utility-maximizing privacy mechanisms

Proceedings of the forty-first annual ACM symposium on Theory of computing
Privacy: Theory meets Practice on the Map

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Differentially private recommender systems: building privacy into the net

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing linear counting queries under differential privacy

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Differentially private data release through multidimensional partitioning

SDM'10 Proceedings of the 7th VLDB conference on Secure data management
Differentially private data cubes: optimizing noise sources and consistency

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Privacy-preserving statistical estimation with optimal convergence rates

Proceedings of the forty-third annual ACM symposium on Theory of computing
Differentially private data release for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Sharing graphs using differentially private graph models

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Our data, ourselves: privacy via distributed noise generation

EUROCRYPT'06 Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography
GUPT: privacy preserving data analysis made easy

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Differentially Private Spatial Decompositions

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Differentially Private Histogram Publication

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is currently a tug-of-war going on surrounding data releases. On one side, there are many strong reasons pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements. On the other side, concerns about individual privacy pull back, and seek to limit releases. Privacy technologies such as differential privacy have been proposed to resolve this deadlock, and there has been much study of how to perform private data release of data in various forms. The focus of such works has been largely on the data owner: what process should they apply to ensure that the released data preserves privacy whilst still capturing the input data distribution accurately. Almost no attention has been paid to the needs of the data user, who wants to make use of the released data within their existing suite of tools and data. The difficulty of making use of data releases is a major stumbling block for the widespread adoption of data privacy technologies. In this paper, instead of proposing new privacy mechanisms for data publishing, we consider the whole data release process, from the data owner to the data user. We lay out a set of principles for privacy tool design that highlights the requirements for interoperability, extensibility and scalability. We put these into practice with UMicS, an end-to-end prototype system to control the release and use of private data. An overarching tenet is that it should be possible to integrate the released data into the data user's systems with the minimum of change and cost. We describe how to instantiate UMicS in a variety of usage scenarios. We show how using data modeling techniques from machine learning can improve the utility, in particular when combined with background knowledge that the data user may possess. We implement UMicS, and evaluate it over a selection of data sets and release cases. We see that UMicS allows for very effective use of released data, while upholding our privacy principles.