Limiting disclosure of sensitive data in sequential releases of databases

Authors:
Erez Shmueli;Tamir Tassa;Raz Wasserstein;Bracha Shapira;Lior Rokach
Affiliations:
Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel;Division of Computer Science, The Open University, Ra'anana, Israel;Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel;Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel;Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel
Venue:
Information Sciences: an International Journal
Year:
2012

Citing 43
Cited 1

Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems

Journal of Biomedical Informatics
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Checking for k-anonymity violation by views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Template-Based Privacy Preservation in Classification Problems

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Handicapping attacker's confidence: an alternative to k-anonymization

Knowledge and Information Systems
Approximate algorithms for K-anonymity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining K-Anonymity against Incremental Updates

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Thoughts on k-anonymization

Data & Knowledge Engineering
Minimality attack in privacy preserving data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anonymity for continuous data publishing

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Privacy-Preserving Data Mining: Models and Algorithms

Privacy-Preserving Data Mining: Models and Algorithms
Privacy preserving serial data publishing by role composition

Proceedings of the VLDB Endowment
k-Anonymization with Minimal Loss of Information

IEEE Transactions on Knowledge and Data Engineering
k-Anonymization Revisited

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A Survey on Privacy Preserving Data Mining

DBTA '09 Proceedings of the 2009 First International Workshop on Database Technology and Applications
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Privacy-preserving data mining: A feature set partitioning approach

Information Sciences: an International Journal
Non-homogeneous generalization in privacy preserving data publishing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient Anonymizations with Enhanced Utility

Transactions on Data Privacy
Closeness: A New Privacy Measure for Data Publishing

IEEE Transactions on Knowledge and Data Engineering
From statistics to beliefs

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
From t-Closeness-Like Privacy to Postrandomization via Information Theory

IEEE Transactions on Knowledge and Data Engineering
SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness

The VLDB Journal — The International Journal on Very Large Data Bases
Personal privacy vs population privacy: learning to attack anonymization

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Secure anonymization for incremental datasets

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management

Improving accuracy of classification models induced from anonymized datasets

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of methods to enable publishing of data while minimizing distortion, for maintaining usability on one hand, and respecting privacy on the other hand. Sequential release is a scenario of data publishing where multiple releases of the same underlying table are published over a period of time. A violation of privacy, in this case, may emerge from any one of the releases, or as a result of joining information from different releases. Similarly to [37], our privacy definitions limit the ability of an adversary who combines information from all releases, to link values of the quasi-identifiers to sensitive values. We extend the framework that was considered in Ref. [37] in three ways: We allow a greater number of releases, we consider the more flexible local recoding model of ''cell generalization'' (as opposed to the global recoding model of ''cut generalization'' in Ref. [37]), and we include the case where records may be added to the underlying table from time to time. Our extension of the framework requires also to modify the manner in which privacy is evaluated. We show that while [37] based their privacy evaluation on the notion of the Match Join between the releases, it is no longer suitable for the extended framework considered here. We define more restrictive types of join between the published releases (the Full Match Join and the Kernel Match Join) that are more suitable for privacy evaluation in this context. We then present a top-down algorithm for anonymizing sequential releases in the cell generalization model, that is based on our modified privacy evaluations. Our theoretical study is followed by experimentation that demonstrates a staggering improvement in terms of utility due to the adoption of the cell generalization model, and exemplifies the correction in the privacy evaluation as offered by using the Full or Kernel Match Joins instead of the Match Join.