Local and global recoding methods for anonymizing set-valued data

Authors:
Manolis Terrovitis;Nikos Mamoulis;Panos Kalnis
Affiliations:
Institute for the Management of Information Systems (IMIS), Research Center "Athena", Athena, Greece;Department of Computer Science, University of Hong Kong, Hong Kong, China;Division of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2011

Citing 26
Cited 8

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Multiple-Level Association Rules in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Approximate algorithms for K-anonymity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Thoughts on k-anonymization

Data & Knowledge Engineering
Fast data anonymization with low information loss

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anonymity preserving pattern discovery

The VLDB Journal — The International Journal on Very Large Data Bases
Anonymizing transaction databases for publication

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving anonymization of set-valued data

Proceedings of the VLDB Endowment
On the Anonymization of Sparse High-Dimensional Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Anonymization of set-valued data via top-down, local generalization

Proceedings of the VLDB Endowment

PCTA: privacy-constrained clustering-based transaction data anonymization

Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society
Data anonymization using an improved utility measurement

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Utility-preserving transaction data anonymization with low information loss

Expert Systems with Applications: An International Journal
Institute for the management of information systems Athena research center

ACM SIGMOD Record
Utility-guided Clustering-based Transaction Data Anonymization

Transactions on Data Privacy
Privacy preservation by disassociation

Proceedings of the VLDB Endowment
Anonymizing set-valued data by nonreciprocal recoding

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A new tool for sharing and querying of clinical documents modeled using HL7 Version 3 standard

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m -anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice.