Secure construction of k-unlinkable patient records from distributed providers

  • Authors:
  • Bradley Malin

  • Affiliations:
  • Department of Biomedical Informatics, School of Medicine, 2525 West End Avenue, Suite 600, Vanderbilt University, Nashville, TN 37203, USA

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objectives: Healthcare organizations must adopt measures to uphold their patients' right to anonymity when sharing sensitive records, such as DNA sequences, to publicly accessible databanks. This is often achieved by suppressing patient identifiable information; however, such a practice is insufficient because the same organizations may disclose identified patient information, devoid of the sensitive information, for other purposes and patients' organization-visit patterns, or trails, can re-identify records to the identities from which they were derived. There exist various algorithms that healthcare organizations can apply to ascertain when a patient's record is susceptible to trail re-identification, but they require organizations to exchange information regarding the identities of their patients prior to data protection certification. In this paper, we introduce an algorithmic approach to formally thwart trail re-identification in a secure setting. Methods and materials: We present a framework that allows data holders to securely collaborate through a third party. In doing so, healthcare organizations keep all sensitive information in an encrypted state until the third party certifies that the data to be disclosed satisfies a formal data protection model. The model adopted for this work is an extended form of k-unlinkability, a protection model that, until this work, was applied in a non-secure setting only. Given the framework and protection model, we develop an algorithm to generate data that satisfies the protection model. In doing so, we enable healthcare organizations to prevent trail re-identification without revealing identified information. Results: Theoretically, we prove that the proposed data protection model does not leak information, even in the context of an organization's prior knowledge. Empirically, we use real world hospital discharge records to demonstrate that, while the secure protocol induces additional suppression of patient information in comparison to an existing non-secure approach, the quantity of data disclosed by the secure protocol remains substantial. For instance, in a population of over 7700 sickle cell anemia patients, the non-secure protocol discloses 99.48% of DNA records whereas the secure protocol permits the disclosure of 99.41%. Conclusions: Our results demonstrate healthcare organizations can collaborate to disclose significant quantities of personal biomedical data without violating their anonymity in the process.