Efficient k-anonymization using clustering techniques

  • Authors:
  • Ji-Won Byun;Ashish Kamra;Elisa Bertino;Ninghui Li

  • Affiliations:
  • CERIAS and Computer Science, Purdue University;CERIAS and Electrical and Computer Engineering, Purdue University;CERIAS and Computer Science, Purdue University;CERIAS and Computer Science, Purdue University

  • Venue:
  • DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

k-anonymization techniques have been the focus of intense research in the last few years. An important requirement for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications. In this paper we propose an approach that uses the idea of clustering to minimize information loss and thus ensure good data quality. The key observation here is that data records that are naturally similar to each other should be part of the same equivalence class. We thus formulate a specific clustering problem, referred to as k-member clustering problem. We prove that this problem is NP-hard and present a greedy heuristic, the complexity of which is in O(n2). As part of our approach we develop a suitable metric to estimate the information loss introduced by generalizations, which works for both numeric and categorical data.