Non-homogeneous generalization in privacy preserving data publishing

Authors:
Wai Kit Wong;Nikos Mamoulis;David Wai Lok Cheung
Affiliations:
The University of Hong Kong, Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong, Hong Kong;The University of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 23
Cited 7

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The new Casper: query processing for location services without compromising privacy

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Revisiting the uniqueness of simple demographics in the US population

Proceedings of the 5th ACM workshop on Privacy in electronic society
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Preventing Location-Based Identity Inference in Anonymous Spatial Queries

IEEE Transactions on Knowledge and Data Engineering
Minimality attack in privacy preserving data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
K-anonymization as spatial indexing: toward scalable and incremental anonymization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Fast data anonymization with low information loss

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
On Anti-Corruption Privacy Preserving Publication

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
k-Anonymization Revisited

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Cloning for privacy protection in multiple independent data publications

Proceedings of the 20th ACM international conference on Information and knowledge management
Limiting disclosure of sensitive data in sequential releases of databases

Information Sciences: an International Journal
Anonymizing set-valued data by nonreciprocal recoding

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering-Based k-anonymity

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Fast clustering-based anonymization approaches with time constraints for data streams

Knowledge-Based Systems
Improving accuracy of classification models induced from anonymized datasets

Information Sciences: an International Journal
A general framework for privacy preserving data publishing

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most previous research on privacy-preserving data publishing, based on the k-anonymity model, has followed the simplistic approach of homogeneously giving the same generalized value in all quasi-identifiers within a partition. We observe that the anonymization error can be reduced if we follow a non-homogeneous generalization approach for groups of size larger than k. Such an approach would allow tuples within a partition to take different generalized quasi-identifier values. Anonymization following this model is not trivial, as its direct application can easily violate k-anonymity. In addition, non-homogeneous generalization allows for additional types of attack, which should be considered in the process. We provide a methodology for verifying whether a non-homogeneous generalization violates k-anonymity. Then, we propose a technique that generates a non-homogeneous generalization for a partition and show that its result satisfies k-anonymity, however by straightforwardly applying it, privacy can be compromised if the attacker knows the anonymization algorithm. Based on this, we propose a randomization method that prevents this type of attack and show that k-anonymity is not compromised by it. Nonhomogeneous generalization can be used on top of any existing partitioning approach to improve its utility. In addition, we show that a new partitioning technique tailored for non-homogeneous generalization can further improve quality. A thorough experimental evaluation demonstrates that our methodology greatly improves the utility of anonymized data in practice.