Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization

  • Authors:
  • Rinku Dewri;Indrajit Ray;Indrakshi Ray;Darrell Whitley

  • Affiliations:
  • Department of Computer Science, University of Denver, Denver, CO, USA. Email: rdewri@cs.du.edu;Department of Computer Science, Colorado State University, Fort Collins, CO, USA. E-mails: {indrajit, iray, whitley}@cs.colostate.edu;Department of Computer Science, Colorado State University, Fort Collins, CO, USA. E-mails: {indrajit, iray, whitley}@cs.colostate.edu;Department of Computer Science, Colorado State University, Fort Collins, CO, USA. E-mails: {indrajit, iray, whitley}@cs.colostate.edu

  • Venue:
  • Journal of Computer Security
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data anonymization techniques have received extensive attention in the privacy research community over the past several years. Various models of privacy preservation have been proposed: k-anonymity, ℓ-diversity and t-closeness, to name a few. An oft-cited drawback of these models is that there is considerable loss in data quality arising from the use of generalization and suppression techniques. Optimization attempts in this context have so far focused on maximizing the data utility for a pre-specified level of privacy. To determine if better privacy levels are obtainable with the same level of data utility, majority of the existing formulations require exhaustive analysis. Further, the data publisher's perspective is often missed in the process. The publisher wishes to maintain a given level of data utility since the data utility is the revenue earner and then maximize the level of privacy within acceptable limits. In this paper, we explore this privacy versus data quality trade-off as a multi-objective optimization problem. Our goal is to provide substantial information to a data publisher about the trade-offs available between the privacy level and the information content of an anonymized data set.