k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Privacy: A Machine Learning View
IEEE Transactions on Knowledge and Data Engineering
Trail re-identification and unlinkability in distributed databases
Trail re-identification and unlinkability in distributed databases
Confidentiality issues for medical data miners
Artificial Intelligence in Medicine
Medical privacy protection based on granular computing
Artificial Intelligence in Medicine
k-Unlinkability: A privacy protection model for distributed data
Data & Knowledge Engineering
Secure construction of k-unlinkable patient records from distributed providers
Artificial Intelligence in Medicine
Enabling location privacy and medical data encryption in patient telemonitoring systems
IEEE Transactions on Information Technology in Biomedicine - Special section on body sensor networks
Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
Proceedings of the 2010 Workshop on Governance of Technology, Information and Policies
Privacy Challenges in the Use of eHealth Systems for Public Health Management
International Journal of E-Health and Medical Communications
Hi-index | 0.00 |
Objective: Health care organizations must preserve a patient's anonymity when disclosing personal data. Traditionally, patient identity has been protected by stripping identifiers from sensitive data such as DNA. However, simple automated methods can re-identify patient data using public information. In this paper, we present a solution to prevent a threat to patient anonymity that arises when multiple health care organizations disclose data. In this setting, a patient's location visit pattern, or ''trail'', can re-identify seemingly anonymous DNA to patient identity. This threat exists because health care organizations (1) cannot prevent the disclosure of certain types of patient information and (2) do not know how to systematically avoid trail re-identification. In this paper, we develop and evaluate computational methods that health care organizations can apply to disclose patient-specific DNA records that are impregnable to trail re-identification. Methods and materials: To prevent trail re-identification, we introduce a formal model called k-unlinkability, which enables health care administrators to specify different degrees of patient anonymity. Specifically, k-unlinkability is satisfied when the trail of each DNA record is linkable to no less than k identified records. We present several algorithms that enable health care organizations to coordinate their data disclosure, so that they can determine which DNA records can be shared without violating k-unlinkability. We evaluate the algorithms with the trails of patient populations derived from publicly available hospital discharge databases. Algorithm efficacy is evaluated using metrics based on real world applications, including the number of suppressed records and the number of organizations that disclose records. Results: Our experiments indicate that it is unnecessary to suppress all patient records that initially violate k-unlinkability. Rather, only portions of the trails need to be suppressed. For example, if each hospital discloses 100% of its data on patients diagnosed with cystic fibrosis, then 48% of the DNA records are 5-unlinkable. A naive solution would suppress the 52% of the DNA records that violate 5-unlinkability. However, by applying our protection algorithms, the hospitals can disclose 95% of the DNA records, all of which are 5-unlinkable. Similar findings hold for all populations studied. Conclusion: This research demonstrates that patient anonymity can be formally protected in shared databases. Our findings illustrate that significant quantities of patient-specific data can be disclosed with provable protection from trail re-identification. The configurability of our methods allows health care administrators to quantify the effects of different levels of privacy protection and formulate policy accordingly.