Confidentiality issues for medical data miners

  • Authors:
  • Jules J Berman

  • Affiliations:
  • Pathology Informatics Cancer Diagnosis Program, DCTD, NCI, NIH, EPN-Room 6028, 6130 Executive Bluilding, Rockville, MD 20892, USA

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The first task in any medical data mining effort is ensuring patient confidentiality. In the past, most data mining efforts ensured confidentiality by the dubious policy of witholding their raw data from colleagues and the public. A cursory review of medical informatics literature in the past decade reveals that much of what we have ''learned'' consists of assertions derived from confidential datasets unavailable for anyone's review. Without access to the original data, it is impossible to validate or improve upon a researcher's conclusions. Without access to research data, we are asked to accept findings as an act of faith, rather than as a scientific conclusion. This special issue of Artificial Intelligence in Medicine is devoted to medical data mining. The medical data miner has an obligation to conduct valid research in a way that protects human subjects. Today, data miners have the technical tools to merge large data collections and to distribute queries over disparate databases. In order to include patient-related data in shared databases, data miners will need methods to anonymize and deidentify data. This article reviews the human subject risks associated with medical data mining. This article also describes some of the innovative computational remedies that will permit researchers to conduct research AND share their data without risk to patient or institution.