Efficient discovery of de-identification policy options through a risk-utility frontier

Authors:
Weiyi Xia;Raymond Heatherly;Xiaofeng Ding;Jiuyong Li;Bradley Malin
Affiliations:
Vanderbilt University, Nashville, TN, USA;Vanderbilt University, Nashville, TN, USA;University of South Australia, Mawson Lakes, Australia;University of South Australia, Mawson Lakes, Australia;Vanderbilt University, Nashville, TN, USA
Venue:
Proceedings of the third ACM conference on Data and application security and privacy
Year:
2013

Citing 21
Cited 0

Modelling user uncertainty for disclosure risk and data utility

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A study of preferences for sharing and privacy

CHI '05 Extended Abstracts on Human Factors in Computing Systems
Disclosure risk measures for microdata

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The boundary between privacy and utility in data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Towards optimal k-anonymization

Data & Knowledge Engineering
Heuristics for De-identifying Health Data

IEEE Security and Privacy
On the Optimal Selection of k in the k-Anonymity Problem

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On the tradeoff between privacy and utility in data publishing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Myths and fallacies of "Personally Identifiable Information"

Communications of the ACM
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Beyond safe harbor: automatic discovery of health information de-identification policy alternatives

Proceedings of the 1st ACM International Health Informatics Symposium
Anonymizing tables

ICDT'05 Proceedings of the 10th international conference on Database Theory
Statistical disclosure control methods through a risk-utility framework

PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern information technologies enable organizations to capture large quantities of person-specific data while providing routine services. Many organizations hope, or are legally required, to share such data for secondary purposes (e.g., to enable the validation of research findings) in a de-identified manner. In previous work, it was shown that de-identification policy alternatives could be modeled on a lattice, which could be searched for policies that met a prespecified risk threshold (e.g., likelihood of re-identification). However, the search was limited in several ways. First, its definition of utility was syntactic - based on the level of the lattice - and not semantic - based on the actual changes induced in the resulting data. Second, the threshold may not be known in advance. The goal of this work is to build the optimal set of policies that trade-off between privacy risk (R) and utility (U), which we refer to as a R-U frontier. To model this problem, we introduce a semantic definition of utility, based on information theory, that is compatible with the lattice representation of policies. To solve the problem, we initially build a set of policies that define a frontier. We then use a probability-guided heuristic to search the lattice for policies likely to update the frontier. To demonstrate the effectiveness of our approach, we perform an empirical analysis with the Adult dataset of the UCI Machine Learning Repository. We show that our approach can construct a frontier closer to optimal than competitive approaches by searching a smaller number of policies. In addition, we show that a frequently followed de-identification policy (i.e., the Safe Harbor standard of the HIPAA Privacy Rule) is suboptimal in comparison to the frontier discovered by our approach.