On the complexity of optimal K-anonymity

  • Authors:
  • Adam Meyerson;Ryan Williams

  • Affiliations:
  • University of California, Los Angeles, CA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4, However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.