Optimal random perturbation at multiple privacy levels

Authors:
Xiaokui Xiao;Yufei Tao;Minghua Chen
Affiliations:
Nanyang Technological University, Singapore;Chinese University of Hong Kong, New Territories, Hong Kong;Chinese University of Hong Kong, New Territories, Hong Kong
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 30
Cited 3

Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Using randomized response techniques for privacy-preserving data mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Framework for High-Accuracy Privacy-Preserving Mining

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy preserving OLAP

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The boundary between privacy and utility in data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Minimality attack in privacy preserving data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mechanism Design via Differential Privacy

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
A learning theory approach to non-interactive database privacy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Composition attacks and auxiliary information in data privacy

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On Anti-Corruption Privacy Preserving Publication

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Privacy: Theory meets Practice on the Map

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Relationship privacy: output perturbation for queries with joins

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Secure anonymization for incremental datasets

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography

Versatile publishing for privacy preservation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Small domain randomization: same privacy, more utility

Proceedings of the VLDB Endowment
iReduct: differential privacy with reduced relative errors

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.03

Visualization

Abstract

Random perturbation is a popular method of computing anonymized data for privacy preserving data mining. It is simple to apply, ensures strong privacy protection, and permits effective mining of a large variety of data patterns. However, all the existing studies with good privacy guarantees focus on perturbation at a single privacy level. Namely, a fixed degree of privacy protection is imposed on all anonymized data released by the data holder. This drawback seriously limits the applicability of random perturbation in scenarios where the holder has numerous recipients to which different privacy levels apply. Motivated by this, we study the problem of multi-level perturbation, whose objective is to release multiple versions of a dataset anonymized at different privacy levels. The challenge is that various recipients may collude by sharing their data to infer privacy beyond their permitted levels. Our solution overcomes this obstacle, and achieves two crucial properties. First, collusion is useless, meaning that the colluding recipients cannot learn anything more than what the most trustable recipient (among the colluding recipients) already knows alone. Second, the data each recipient receives can be regarded (and hence, analyzed in the same way) as the output of conventional uniform perturbation. Besides its solid theoretical foundation, the proposed technique is both space economical and computationally efficient. It requires O (n+m) expected space, and produces a new anonymized version in O (n + log m) expected time, where n is the cardinality of the original dataset, and m the number of versions released previously. Both bounds are optimal under the realistic assumption that n » m.