On the challenges of balancing privacy and utility of open health data

Authors:
Christian Guttmann;Xingzhi Sun;Chaitanya Rao;Carlos Queiroz;Benjamin I. P. Rubinstein
Affiliations:
IBM Research - Australia;IBM Research - Australia;IBM Research - Australia;IBM Research - Australia;IBM Research - Australia
Venue:
Joint Proceedings of the Workshop on AI Problems and Approaches for Intelligent Environments and Workshop on Semantic Cities
Year:
2013

Citing 5
Cited 0

k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Synthesizing Test Data for Fraud Detection Systems

ACSAC '03 Proceedings of the 19th Annual Computer Security Applications Conference
A learning theory approach to non-interactive database privacy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Robust De-anonymization of Large Sparse Datasets

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography

Quantified Score

Hi-index	0.00

Visualization

Abstract

While health data has been collected at large scale for many years, this data is often difficult to obtain for the purpose of research. This is in part due to the cost and complexities involved in preparing this data for third parties. Health data must be adequately de-identified -- a complex process resulting in full or partial "synthetic" data. This paper discusses technological challenges in this process when balancing the preservation of an individual's privacy against the preservation of the data's utility. An example is open health data, where the process of de-identification is often so rigorous that the data is useless for meaningful observational studies. Our discussion is made concrete by considering an open health data set by the American Centres of Medicare and Medicaid Services (CMS).