On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge

Authors:
LAKS V. S. Lakshmanan;Raymond T. Ng;Ganesh Ramesh
Affiliations:
University of British Columbia, Vancouver, B.C.;University of British Columbia, Vancouver, B.C.;Microsoft Corporation, Redmond, WA
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2008

Citing 23
Cited 2

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Security of random data perturbation methods

ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Using sample size to limit exposure to data mining

Journal of Computer Security - Special issue on database security
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Cryptography, a Primer

Cryptography, a Primer
Information-Theoretic Disclosure Risk Measures in Statistical Disclosure Control of Tabular Data

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Mix Zones: User Privacy in Location-aware Services

PERCOMW '04 Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops
Privacy preserving mining of association rules

Information Systems - Knowledge discovery and data mining (KDD 2002)
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Models and methods for privacy-preserving data publishing and analysis: invited tutorial

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
To do or not to do: the dilemma of disclosing anonymized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules*

Data Mining and Knowledge Discovery
Secure XML publishing without information leakage in the presence of data inference

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Can attackers learn from samples?

SDM'05 Proceedings of the Second VDLB international conference on Secure Data Management
Anonymizing tables

ICDT'05 Proceedings of the 10th international conference on Database Theory

A constraint satisfaction cryptanalysis of bloom filters in private record linkage

PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Efficient two-party private blocking based on sorted nearest neighborhood clustering

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision makers of companies often face the dilemma of whether to release data for knowledge discovery, vis-a-vis the risk of disclosing proprietary or sensitive information. Among the various methods employed for “sanitizing” the data prior to disclosure, we focus in this article on anonymization, given its widespread use in practice. We do due diligence to the question “just how safe is the anonymized data?” We consider both those scenarios when the hacker has no information and, more realistically, when the hacker may have partial information about items in the domain. We conduct our analyses in the context of frequent set mining and address the safety question at two different levels: (i) how likely of being cracked (i.e., re-identified by a hacker), are the identities of individual items and (ii) how likely are sets of items cracked? For capturing the prior knowledge of the hacker, we propose a belief function, which amounts to an educated guess of the frequency of each item. For various classes of belief functions which correspond to different degrees of prior knowledge, we derive formulas for computing the expected number of cracks of single items and for itemsets, the probability of cracking the itemsets. While obtaining, exact values for more general situations is computationally hard, we propose a series of heuristics called the O-estimates. They are easy to compute and are shown fairly accurate, justified by empirical results on real benchmark datasets. Based on the O-estimates, we propose a recipe for the decision makers to resolve their dilemma. Our recipe operates at two different levels, depending on whether the data owner wants to reason in terms of single items or sets of items (or both). Finally, we present techniques for ascertaining a hacker's knowledge of correlation in terms of co-occurrence of items likely. This information regarding the hacker's knowledge can be incorporated into our framework of disclosure risk analysis and we present experimental results demonstrating how this knowledge affects the heuristic estimates we have developed.