Practical Data-Oriented Microaggregation for Statistical Disclosure Control
IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Efficient multivariate data-oriented microaggregation
The VLDB Journal — The International Journal on Very Large Data Bases
L-diversity: Privacy beyond k-anonymity
ACM Transactions on Knowledge Discovery from Data (TKDD)
Measures of semantic similarity and relatedness in the biomedical domain
Journal of Biomedical Informatics
Statistical disclosure control architectures for patient records in biomedical information systems
Journal of Biomedical Informatics
On the disclosure risk of multivariate microaggregation
Data & Knowledge Engineering
Density-based microaggregation for statistical disclosure control
Expert Systems with Applications: An International Journal
Comparison of microaggregation approaches on anonymized data quality
Expert Systems with Applications: An International Journal
The Role of Ontologies in the Anonymization of Textual Variables
Proceedings of the 2010 conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence
Towards semantic microaggregation of categorical data for confidential documents
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
An ontology-based measure to compute semantic similarity in biomedicine
Journal of Biomedical Informatics
Towards knowledge intensive data privacy
DPM'10/SETOP'10 Proceedings of the 5th international Workshop on data privacy management, and 3rd international conference on Autonomous spontaneous security
Differentially private data release for data mining
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Ontology-based semantic similarity: A new feature-based approach
Expert Systems with Applications: An International Journal
Journal of Biomedical Informatics
Semantically-grounded construction of centroids for datasets with textual attributes
Knowledge-Based Systems
Hi-index | 0.00 |
Structured patient data like Electronic Health Records (EHRs) are a valuable source for clinical research. However, the sensitive nature of such information requires some anonymisation procedure to be applied before releasing the data to third parties. Several studies have shown that the removal of identifying attributes, like the Social Security Number, is not enough to obtain an anonymous data file, since unique combinations of other attributes as for example, rare diagnoses and personalised treatments, may lead to patient's identity disclosure. To tackle this problem, Statistical Disclosure Control (SDC) methods have been proposed to mask sensitive attributes while preserving, up to a certain degree, the utility of anonymised data. Most of these methods focus on continuous-scale numerical data. Considering that part of the clinical data found in EHRs is expressed with non-numerical attributes as for example, diagnoses, symptoms, procedures, etc., their application to EHRs produces far from optimal results. In this paper, we propose a general framework to enable the accurate application of SDC methods to non-numerical clinical data, with a focus on the preservation of semantics. To do so, we exploit structured medical knowledge bases like SNOMED CT to propose semantically-grounded operators to compare, aggregate and sort non-numerical terms. Our framework has been applied to several well-known SDC methods and evaluated using a real clinical dataset with non-numerical attributes. Results show that the exploitation of medical semantics produces anonymised datasets that better preserve the utility of EHRs.