A semantic framework to protect the privacy of electronic health records with non-numerical attributes

Authors:
Sergio MartíNez;David SáNchez;Aida Valls
Affiliations:
Department of Computer Science and Mathematics, Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona, Catalonia, Spain;Department of Computer Science and Mathematics, Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona, Catalonia, Spain;Department of Computer Science and Mathematics, Universitat Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona, Catalonia, Spain
Venue:
Journal of Biomedical Informatics
Year:
2013

Citing 21
Cited 0

Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems

Journal of Biomedical Informatics
Efficient multivariate data-oriented microaggregation

The VLDB Journal — The International Journal on Very Large Data Bases
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
Statistical disclosure control architectures for patient records in biomedical information systems

Journal of Biomedical Informatics
On the disclosure risk of multivariate microaggregation

Data & Knowledge Engineering
Density-based microaggregation for statistical disclosure control

Expert Systems with Applications: An International Journal
Comparison of microaggregation approaches on anonymized data quality

Expert Systems with Applications: An International Journal
The Role of Ontologies in the Anonymization of Textual Variables

Proceedings of the 2010 conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence
Towards semantic microaggregation of categorical data for confidential documents

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
An ontology-based measure to compute semantic similarity in biomedicine

Journal of Biomedical Informatics
Towards knowledge intensive data privacy

DPM'10/SETOP'10 Proceedings of the 5th international Workshop on data privacy management, and 3rd international conference on Autonomous spontaneous security
Differentially private data release for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Ontology-based semantic similarity: A new feature-based approach

Expert Systems with Applications: An International Journal
Privacy protection of textual attributes through a semantic-based masking method

Information Fusion
Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective

Journal of Biomedical Informatics
Semantically-grounded construction of centroids for datasets with textual attributes

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structured patient data like Electronic Health Records (EHRs) are a valuable source for clinical research. However, the sensitive nature of such information requires some anonymisation procedure to be applied before releasing the data to third parties. Several studies have shown that the removal of identifying attributes, like the Social Security Number, is not enough to obtain an anonymous data file, since unique combinations of other attributes as for example, rare diagnoses and personalised treatments, may lead to patient's identity disclosure. To tackle this problem, Statistical Disclosure Control (SDC) methods have been proposed to mask sensitive attributes while preserving, up to a certain degree, the utility of anonymised data. Most of these methods focus on continuous-scale numerical data. Considering that part of the clinical data found in EHRs is expressed with non-numerical attributes as for example, diagnoses, symptoms, procedures, etc., their application to EHRs produces far from optimal results. In this paper, we propose a general framework to enable the accurate application of SDC methods to non-numerical clinical data, with a focus on the preservation of semantics. To do so, we exploit structured medical knowledge bases like SNOMED CT to propose semantically-grounded operators to compare, aggregate and sort non-numerical terms. Our framework has been applied to several well-known SDC methods and evaluated using a real clinical dataset with non-numerical attributes. Results show that the exploitation of medical semantics produces anonymised datasets that better preserve the utility of EHRs.