Optimal multivariate 2-microaggregation for microdata protection: a 2-approximation

  • Authors:
  • Josep Domingo-Ferrer;Francesc Sebé

  • Affiliations:
  • Department of Computer Engineering and Maths, Rovira i Virgili University of Tarragona, Tarragona, Catalonia;Department of Computer Engineering and Maths, Rovira i Virgili University of Tarragona, Tarragona, Catalonia

  • Venue:
  • PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microaggregation is a special clustering problem where the goal is to cluster a set of points into groups of at least k points in such a way that groups are as homogeneous as possible. Microaggregation arises in connection with anonymization of statistical databases for privacy protection (k-anonymity), where points are assimilated to database records. A usual group homogeneity criterion is within-groups sum of squares minimization SSE. For multivariate points, optimal microaggregation, i.e. with minimum SSE, has been shown to be NP-hard. Recently, a polynomial-time O(k3)-approximation heuristic has been proposed (previous heuristics in the literature offered no approximation bounds). The special case k=2 (2-microaggregation) is interesting in privacy protection scenarios with neither internal intruders nor outliers, because information loss is lower: smaller groups imply smaller information loss. For 2-microaggregation the existing general approximation can only guarantee a 54-approximation. We give here a new polynomial-time heuristic whose SSE is at most twice the minimum SSE (2-approximation).