k-Anonymity in the Presence of External Databases

Authors:
Dimitris Sacharidis;Kyriakos Mouratidis;Dimitris Papadias
Affiliations:
Institute for the Management of Information, Athens and Hong Kong University of Science and Technology, Hong Kong;Singapore Management University, Singapore;Hong Kong University of Science and Technology, Hong Kong
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2010

Citing 0
Cited 3

Learning latent variable models from distributed and abstracted data

Information Sciences: an International Journal
Institute for the management of information systems Athena research center

ACM SIGMOD Record
MAGE: A semantics retaining K-anonymization method for mixed data

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The concept of k-anonymity has received considerable attention due to the need of several organizations to release microdata without revealing the identity of individuals. Although all previous k-anonymity techniques assume the existence of a public database (PD) that can be used to breach privacy, none utilizes PD during the anonymization process. Specifically, existing generalization algorithms create anonymous tables using only the microdata table (MT) to be published, independently of the external knowledge available. This omission leads to high information loss. Motivated by this observation, we first introduce the concept of k-join-anonymity (KJA), which permits more effective generalization to reduce the information loss. Briefly, KJA anonymizes a superset of MT, which includes selected records from PD. We propose two methodologies for adapting k-anonymity algorithms to their KJA counterparts. The first generalizes the combination of MT and PD, under the constraint that each group should contain at least 1 tuple of MT (otherwise, the group is useless and discarded). The second anonymizes MT, and then, refines the resulting groups using PD. Finally, we evaluate the effectiveness of our contributions with an extensive experimental evaluation using real and synthetic data sets.