Beyond safe harbor: automatic discovery of health information de-identification policy alternatives

Authors:
Kathleen Benitez;Grigorios Loukides;Bradley Malin
Affiliations:
Vanderbilt University, Nashville, TN, USA;Vanderbilt University, Nashville, TN, USA;Vanderbilt University, Nashville, TN, USA
Venue:
Proceedings of the 1st ACM International Health Informatics Symposium
Year:
2010

Citing 8
Cited 1

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Disclosure risk measures for microdata

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Revisiting the uniqueness of simple demographics in the US population

Proceedings of the 5th ACM workshop on Privacy in electronic society

Efficient discovery of de-identification policy options through a risk-utility frontier

Proceedings of the third ACM conference on Data and application security and privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Regulations in various countries permit the reuse of health information without patient authorization provided the data is "de-identified". In the United States, for instance, the Privacy Rule of the Health Insurance Portability and Accountability Act defines two distinct approaches to achieve de-identification; the first is Safe Harbor, which requires the removal of a list of identifiers and the second is Expert Determination, which requires that an expert certify the re-identification risk inherent in the data is sufficiently low. In reality, most healthcare organizations eschew the expert route because there are no standardized approaches and Safe Harbor is much simpler to interpret. This, however, precludes a wide range of worthwhile endeavors that are dependent on features suppressed by Safe Harbor, such as gerontological studies requiring detailed ages over 89. In response, we propose a novel approach to automatically discover alternative de-identification policies that contain no more re-identification risk than Safe Harbor. We model this task as a lattice-search problem, introduce a measure to capture the re-identification risk, and develop an algorithm that efficiently discovers polices by exploring the lattice. Using a cohort of approximately 3000 patient records from the Vanderbilt University Medical Center, as well as the Adult dataset from the UCI Machine Learning Repository, we also experimentally verify that a large number of alternative policies can be discovered in an efficient manner.