A computational model to protect patient data from location-based re-identification

Authors:
Bradley Malin
Affiliations:
Department of Biomedical Informatics, Eskind Biomedical Library, Fourth Floor, 2209 Garland Avenue, Vanderbilt University, Nashville, TN 37232-8340, USA
Venue:
Artificial Intelligence in Medicine
Year:
2007

Citing 7
Cited 7

k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Privacy: A Machine Learning View

IEEE Transactions on Knowledge and Data Engineering
How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems

Journal of Biomedical Informatics
Trail re-identification and unlinkability in distributed databases

Trail re-identification and unlinkability in distributed databases
Confidentiality issues for medical data miners

Artificial Intelligence in Medicine
Medical privacy protection based on granular computing

Artificial Intelligence in Medicine

k-Unlinkability: A privacy protection model for distributed data

Data & Knowledge Engineering
Secure construction of k-unlinkable patient records from distributed providers

Artificial Intelligence in Medicine
Enabling location privacy and medical data encryption in patient telemonitoring systems

IEEE Transactions on Information Technology in Biomedicine - Special section on body sensor networks
Measuring risk and information preservation: toward new metrics for de-identification of clinical texts

Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
Building a chain of trust: using policy and practice to enhance trustworthy clinical data discovery and sharing

Proceedings of the 2010 Workshop on Governance of Technology, Information and Policies
An entropy approach to disclosure risk assessment: Lessons from real applications and simulated domains

Decision Support Systems
Privacy Challenges in the Use of eHealth Systems for Public Health Management

International Journal of E-Health and Medical Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Health care organizations must preserve a patient's anonymity when disclosing personal data. Traditionally, patient identity has been protected by stripping identifiers from sensitive data such as DNA. However, simple automated methods can re-identify patient data using public information. In this paper, we present a solution to prevent a threat to patient anonymity that arises when multiple health care organizations disclose data. In this setting, a patient's location visit pattern, or ''trail'', can re-identify seemingly anonymous DNA to patient identity. This threat exists because health care organizations (1) cannot prevent the disclosure of certain types of patient information and (2) do not know how to systematically avoid trail re-identification. In this paper, we develop and evaluate computational methods that health care organizations can apply to disclose patient-specific DNA records that are impregnable to trail re-identification. Methods and materials: To prevent trail re-identification, we introduce a formal model called k-unlinkability, which enables health care administrators to specify different degrees of patient anonymity. Specifically, k-unlinkability is satisfied when the trail of each DNA record is linkable to no less than k identified records. We present several algorithms that enable health care organizations to coordinate their data disclosure, so that they can determine which DNA records can be shared without violating k-unlinkability. We evaluate the algorithms with the trails of patient populations derived from publicly available hospital discharge databases. Algorithm efficacy is evaluated using metrics based on real world applications, including the number of suppressed records and the number of organizations that disclose records. Results: Our experiments indicate that it is unnecessary to suppress all patient records that initially violate k-unlinkability. Rather, only portions of the trails need to be suppressed. For example, if each hospital discloses 100% of its data on patients diagnosed with cystic fibrosis, then 48% of the DNA records are 5-unlinkable. A naive solution would suppress the 52% of the DNA records that violate 5-unlinkability. However, by applying our protection algorithms, the hospitals can disclose 95% of the DNA records, all of which are 5-unlinkable. Similar findings hold for all populations studied. Conclusion: This research demonstrates that patient anonymity can be formally protected in shared databases. Our findings illustrate that significant quantities of patient-specific data can be disclosed with provable protection from trail re-identification. The configurability of our methods allows health care administrators to quantify the effects of different levels of privacy protection and formulate policy accordingly.