An algorithm for suffix stripping
Readings in information retrieval
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Swoogle: a search and metadata engine for the semantic web
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Data Privacy through Optimal k-Anonymization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Mondrian Multidimensional K-Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Utility-based anonymization for privacy preservation with less information loss
ACM SIGKDD Explorations Newsletter
Towards optimal k-anonymization
Data & Knowledge Engineering
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
Ontology-based situation awareness
Information Fusion
Designing ontologies for higher level fusion
Information Fusion
Privacy-preserving anonymization of set-valued data
Proceedings of the VLDB Endowment
Anonymization of set-valued data via top-down, local generalization
Proceedings of the VLDB Endowment
Privacy Preserving Categorical Data Analysis with Unknown Distortion Parameters
Transactions on Data Privacy
Ontology-driven web-based semantic similarity
Journal of Intelligent Information Systems
Information fusion in data privacy: A survey
Information Fusion
Semantically-grounded construction of centroids for datasets with textual attributes
Knowledge-Based Systems
Preventing automatic user profiling in Web 2.0 applications
Knowledge-Based Systems
A semantic similarity method based on information content exploiting multiple ontologies
Expert Systems with Applications: An International Journal
Using profiling techniques to protect the user's privacy in twitter
MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Detecting sensitive information from textual documents: an information-theoretic approach
MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
A New Model to Compute the Information Content of Concepts from Taxonomic Knowledge
International Journal on Semantic Web & Information Systems
Journal of Biomedical Informatics
Towards the estimation of feature-based semantic similarity using multiple ontologies
Knowledge-Based Systems
Copyright for web content using invisible text watermarking
Computers in Human Behavior
Hi-index | 0.00 |
Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with regard to the size of input data.