Hiding distinguished ones into crowd: privacy-preserving publishing data with outliers

Authors:
Hui (Wendy) Wang;Ruilin Liu
Affiliations:
Stevens Institute of Technology, Hoboken, NJ;Stevens Institute of Technology, Hoboken, NJ
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 24
Cited 0

Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching

Communications of the ACM
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Hiding the presence of individuals from shared databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Minimality attack in privacy preserving data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Preservation of proximity privacy in publishing numerical sensitive data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Publishing microdata raises concerns of individual privacy. When there exist outlier records in the microdata, the distinguishability of the outliers enables their privacy to be easier to be compromised than that of regular ones. However, none of the existing anonymization techniques can provide sufficient protection to the privacy of the outliers. In this paper, we study the problem of anonymizing the micro-data that contains outliers. We define the distinguishability-based attack by which the adversary can infer the existence of outliers as well as their private information from the anonymized microdata. To defend against the distinguishability-based attack, we define the plain k-anonymity as the privacy principle. Based on the definition, we categorize the outliers into two types, the ones that cannot be hidden by any plain k-anonymous group (called global outliers) and the ones that can (called local outliers). We propose the algorithm to efficiently anonymize local outliers with low information loss. Our experiments demonstrate the efficiency and effectiveness of our approach.