Secure construction of k-unlinkable patient records from distributed providers

Authors:
Bradley Malin
Affiliations:
Department of Biomedical Informatics, School of Medicine, 2525 West End Avenue, Suite 600, Vanderbilt University, Nashville, TN 37203, USA
Venue:
Artificial Intelligence in Medicine
Year:
2010

Citing 26
Cited 1

One-way accumulators: a decentralized alternative to digital signatures

EUROCRYPT '93 Workshop on the theory and application of cryptographic techniques on Advances in cryptology
A method for obtaining digital signatures and public-key cryptosystems

Communications of the ACM
Universally composable two-party and multi-party secure computation

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Foundations of Cryptography: Volume 2, Basic Applications

Foundations of Cryptography: Volume 2, Basic Applications
Secure and private sequence comparisons

Proceedings of the 2003 ACM workshop on Privacy in the electronic society
Privacy: A Machine Learning View

IEEE Transactions on Knowledge and Data Engineering
When do data mining results violate privacy?

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems

Journal of Biomedical Informatics
Configurable Security Protocols for Multi-party Data Analysis with Malicious Participants

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Blocking-aware private record linkage

Proceedings of the 2nd international workshop on Information quality in information systems
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Composition and Disclosure of Unlinkable Distributed Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A secure distributed framework for achieving k-anonymity

The VLDB Journal — The International Journal on Very Large Data Bases
Trail re-identification and unlinkability in distributed databases

Trail re-identification and unlinkability in distributed databases
A computational model to protect patient data from location-based re-identification

Artificial Intelligence in Medicine
k-Unlinkability: A privacy protection model for distributed data

Data & Knowledge Engineering
Robust De-anonymization of Large Sparse Datasets

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Allowing privacy protection algorithms to jump out of local optimums: an ordered greed framework

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Messin' with texas deriving mother's maiden names using public records

ACNS'05 Proceedings of the Third international conference on Applied Cryptography and Network Security
Confidentiality issues for medical data miners

Artificial Intelligence in Medicine
An improved algorithm for computing logarithms over and its cryptographic significance (Corresp.)

IEEE Transactions on Information Theory

An entropy approach to disclosure risk assessment: Lessons from real applications and simulated domains

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objectives: Healthcare organizations must adopt measures to uphold their patients' right to anonymity when sharing sensitive records, such as DNA sequences, to publicly accessible databanks. This is often achieved by suppressing patient identifiable information; however, such a practice is insufficient because the same organizations may disclose identified patient information, devoid of the sensitive information, for other purposes and patients' organization-visit patterns, or trails, can re-identify records to the identities from which they were derived. There exist various algorithms that healthcare organizations can apply to ascertain when a patient's record is susceptible to trail re-identification, but they require organizations to exchange information regarding the identities of their patients prior to data protection certification. In this paper, we introduce an algorithmic approach to formally thwart trail re-identification in a secure setting. Methods and materials: We present a framework that allows data holders to securely collaborate through a third party. In doing so, healthcare organizations keep all sensitive information in an encrypted state until the third party certifies that the data to be disclosed satisfies a formal data protection model. The model adopted for this work is an extended form of k-unlinkability, a protection model that, until this work, was applied in a non-secure setting only. Given the framework and protection model, we develop an algorithm to generate data that satisfies the protection model. In doing so, we enable healthcare organizations to prevent trail re-identification without revealing identified information. Results: Theoretically, we prove that the proposed data protection model does not leak information, even in the context of an organization's prior knowledge. Empirically, we use real world hospital discharge records to demonstrate that, while the secure protocol induces additional suppression of patient information in comparison to an existing non-secure approach, the quantity of data disclosed by the secure protocol remains substantial. For instance, in a population of over 7700 sickle cell anemia patients, the non-secure protocol discloses 99.48% of DNA records whereas the secure protocol permits the disclosure of 99.41%. Conclusions: Our results demonstrate healthcare organizations can collaborate to disclose significant quantities of personal biomedical data without violating their anonymity in the process.