On the challenges of balancing privacy and utility of open health data

  • Authors:
  • Christian Guttmann;Xingzhi Sun;Chaitanya Rao;Carlos Queiroz;Benjamin I. P. Rubinstein

  • Affiliations:
  • IBM Research - Australia;IBM Research - Australia;IBM Research - Australia;IBM Research - Australia;IBM Research - Australia

  • Venue:
  • Joint Proceedings of the Workshop on AI Problems and Approaches for Intelligent Environments and Workshop on Semantic Cities
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

While health data has been collected at large scale for many years, this data is often difficult to obtain for the purpose of research. This is in part due to the cost and complexities involved in preparing this data for third parties. Health data must be adequately de-identified -- a complex process resulting in full or partial "synthetic" data. This paper discusses technological challenges in this process when balancing the preservation of an individual's privacy against the preservation of the data's utility. An example is open health data, where the process of de-identification is often so rigorous that the data is useless for meaningful observational studies. Our discussion is made concrete by considering an open health data set by the American Centres of Medicare and Medicaid Services (CMS).