Limiting disclosure of sensitive data in sequential releases of databases

  • Authors:
  • Erez Shmueli;Tamir Tassa;Raz Wasserstein;Bracha Shapira;Lior Rokach

  • Affiliations:
  • Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel;Division of Computer Science, The Open University, Ra'anana, Israel;Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel;Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel;Deutsche Telekom Laboratories and the Department of Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of methods to enable publishing of data while minimizing distortion, for maintaining usability on one hand, and respecting privacy on the other hand. Sequential release is a scenario of data publishing where multiple releases of the same underlying table are published over a period of time. A violation of privacy, in this case, may emerge from any one of the releases, or as a result of joining information from different releases. Similarly to [37], our privacy definitions limit the ability of an adversary who combines information from all releases, to link values of the quasi-identifiers to sensitive values. We extend the framework that was considered in Ref. [37] in three ways: We allow a greater number of releases, we consider the more flexible local recoding model of ''cell generalization'' (as opposed to the global recoding model of ''cut generalization'' in Ref. [37]), and we include the case where records may be added to the underlying table from time to time. Our extension of the framework requires also to modify the manner in which privacy is evaluated. We show that while [37] based their privacy evaluation on the notion of the Match Join between the releases, it is no longer suitable for the extended framework considered here. We define more restrictive types of join between the published releases (the Full Match Join and the Kernel Match Join) that are more suitable for privacy evaluation in this context. We then present a top-down algorithm for anonymizing sequential releases in the cell generalization model, that is based on our modified privacy evaluations. Our theoretical study is followed by experimentation that demonstrates a staggering improvement in terms of utility due to the adoption of the cell generalization model, and exemplifies the correction in the privacy evaluation as offered by using the Full or Kernel Match Joins instead of the Match Join.