To do or not to do: the dilemma of disclosing anonymized data

Authors:
Laks V. S. Lakshmanan;Raymond T. Ng;Ganesh Ramesh
Affiliations:
University of British Columbia;University of British Columbia;University of British Columbia
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 19
Cited 10

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Security of random data perturbation methods

ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Using sample size to limit exposure to data mining

Journal of Computer Security - Special issue on database security
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
The statistical security of a statistical database

ACM Transactions on Database Systems (TODS)
Information-Theoretic Disclosure Risk Measures in Statistical Disclosure Control of Tabular Data

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Cryptographic techniques for privacy-preserving data mining

ACM SIGKDD Explorations Newsletter
Revealing information while preserving privacy

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Privacy preserving mining of association rules

Information Systems - Knowledge discovery and data mining (KDD 2002)
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Secure XML publishing without information leakage in the presence of data inference

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Anonymizing tables

ICDT'05 Proceedings of the 10th international conference on Database Theory

k-Unlinkability: A privacy protection model for distributed data

Data & Knowledge Engineering
On static and dynamic methods for condensation-based privacy-preserving data mining

ACM Transactions on Database Systems (TODS)
On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge

ACM Transactions on Knowledge Discovery from Data (TKDD)
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk

Transactions on Data Privacy
Learning latent variable models from distributed and abstracted data

Information Sciences: an International Journal
Can attackers learn from samples?

SDM'05 Proceedings of the Second VDLB international conference on Secure Data Management
Viewpoints on emergent semantics

Journal on Data Semantics VI
Beyond k-anonymity: a decision theoretic framework for assessing privacy risk

PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision makers of companies often face the dilemma of whether to release data for knowledge discovery, vis a vis the risk of disclosing proprietary or sensitive information. While there are various "sanitization" methods, in this paper we focus on anonymization, given its widespread use in practice. We give due diligence to the question of "just how safe the anonymized data is", in terms of protecting the true identities of the data objects. We consider both the scenarios when the hacker has no information, and more realistically, when the hacker may have partial information about items in the domain. We conduct our analyses in the context of frequent set mining. We propose to capture the prior knowledge of the hacker by means of a belief function, where an educated guess of the frequency of each item is assumed. For various classes of belief functions, which correspond to different degrees of prior knowledge, we derive formulas for computing the expected number of "cracks". While obtaining the exact values for the more general situations is computationally hard, we propose a heuristic called the O-estimate. It is easy to compute, and is shown to be accurate empirically with real benchmark datasets. Finally, based on the O-estimates, we propose a recipe for the decision makers to resolve their dilemma.