k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Synthesizing Test Data for Fraud Detection Systems
ACSAC '03 Proceedings of the 19th Annual Computer Security Applications Conference
A learning theory approach to non-interactive database privacy
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Robust De-anonymization of Large Sparse Datasets
SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Calibrating noise to sensitivity in private data analysis
TCC'06 Proceedings of the Third conference on Theory of Cryptography
Hi-index | 0.00 |
While health data has been collected at large scale for many years, this data is often difficult to obtain for the purpose of research. This is in part due to the cost and complexities involved in preparing this data for third parties. Health data must be adequately de-identified -- a complex process resulting in full or partial "synthetic" data. This paper discusses technological challenges in this process when balancing the preservation of an individual's privacy against the preservation of the data's utility. An example is open health data, where the process of de-identification is often so rigorous that the data is useless for meaningful observational studies. Our discussion is made concrete by considering an open health data set by the American Centres of Medicare and Medicaid Services (CMS).