Modelling user uncertainty for disclosure risk and data utility
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Specialization for Information and Privacy Preservation
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A study of preferences for sharing and privacy
CHI '05 Extended Abstracts on Human Factors in Computing Systems
Disclosure risk measures for microdata
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Mondrian Multidimensional K-Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Injecting utility into anonymized datasets
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The boundary between privacy and utility in data publishing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Towards optimal k-anonymization
Data & Knowledge Engineering
Heuristics for De-identifying Health Data
IEEE Security and Privacy
On the Optimal Selection of k in the k-Anonymity Problem
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On the tradeoff between privacy and utility in data publishing
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Myths and fallacies of "Personally Identifiable Information"
Communications of the ACM
Privacy-preserving data publishing: A survey of recent developments
ACM Computing Surveys (CSUR)
Beyond safe harbor: automatic discovery of health information de-identification policy alternatives
Proceedings of the 1st ACM International Health Informatics Symposium
ICDT'05 Proceedings of the 10th international conference on Database Theory
Statistical disclosure control methods through a risk-utility framework
PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases
Hi-index | 0.00 |
Modern information technologies enable organizations to capture large quantities of person-specific data while providing routine services. Many organizations hope, or are legally required, to share such data for secondary purposes (e.g., to enable the validation of research findings) in a de-identified manner. In previous work, it was shown that de-identification policy alternatives could be modeled on a lattice, which could be searched for policies that met a prespecified risk threshold (e.g., likelihood of re-identification). However, the search was limited in several ways. First, its definition of utility was syntactic - based on the level of the lattice - and not semantic - based on the actual changes induced in the resulting data. Second, the threshold may not be known in advance. The goal of this work is to build the optimal set of policies that trade-off between privacy risk (R) and utility (U), which we refer to as a R-U frontier. To model this problem, we introduce a semantic definition of utility, based on information theory, that is compatible with the lattice representation of policies. To solve the problem, we initially build a set of policies that define a frontier. We then use a probability-guided heuristic to search the lattice for policies likely to update the frontier. To demonstrate the effectiveness of our approach, we perform an empirical analysis with the Adult dataset of the UCI Machine Learning Repository. We show that our approach can construct a frontier closer to optimal than competitive approaches by searching a smaller number of policies. In addition, we show that a frequently followed de-identification policy (i.e., the Safe Harbor standard of the HIPAA Privacy Rule) is suboptimal in comparison to the frontier discovered by our approach.