The Role of Ontologies in the Anonymization of Textual Variables

Authors:
Sergio Martínez;David Sánchez;Aïda Valls;Montserrat Batet
Affiliations:
Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA) research group, Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Av. Països Cat ...;Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA) research group, Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Av. Països Cat ...;Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA) research group, Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Av. Països Cat ...;Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA) research group, Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Av. Països Cat ...
Venue:
Proceedings of the 2010 conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence
Year:
2010

Citing 13
Cited 2

A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Utility-based anonymization for privacy preservation with less information loss

ACM SIGKDD Explorations Newsletter
Towards optimal k-anonymization

Data & Knowledge Engineering
Privacy-Preserving Data Mining: Models and Algorithms

Privacy-Preserving Data Mining: Models and Algorithms
Anonymization of set-valued data via top-down, local generalization

Proceedings of the VLDB Endowment
Privacy Preserving Categorical Data Analysis with Unknown Distortion Parameters

Transactions on Data Privacy
Ontology-based anonymization of categorical values

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Exploiting taxonomical knowledge to compute semantic similarity: an evaluation in the biomedical domain

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I

Ontology-based anonymization of categorical values

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
A semantic framework to protect the privacy of electronic health records with non-numerical attributes

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The exploitation of sensible data associated to individuals requires a proper anonymization in order to preserve the privacy. Even though several masking methods have been designed for numerical data, very few of them deal with textual information. During the masking process, information loss should be minimized in order to enable a proper analysis of data with data mining methods. In the case of textual data, the quality of the anonymized dataset is closely related to the preservation of semantics, a dimension which has been only shallowly considered in some previous works, by using small and ad-hoc hierarchies of words. In this work we want to study the use of large and standard ontologies as the base to perform the anonymization of textual variables. We will evaluate the role of ontologies in preserving the utility of the anonymized information when a partition of the objects is done with unsupervised clustering methods. Results show that by exploiting detailed ontologies, one is able to improve the preservation of the data semantics in comparison to approaches based on ad-hoc structures and data distribution metrics.