Efficient multivariate data-oriented microaggregation

  • Authors:
  • Josep Domingo-Ferrer;Antoni Martínez-Ballesté;Josep Maria Mateo-Sanz;Francesc Sebé

  • Affiliations:
  • Department of Computer Engineering & Maths, Rovira i Virgili University of Tarragona, Catalonia;Department of Computer Engineering & Maths, Rovira i Virgili University of Tarragona, Catalonia;Statistics Group, Rovira i Virgili University of Tarragona, Catalonia;Department of Computer Engineering & Maths, Rovira i Virgili University of Tarragona, Catalonia

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released while preserving the privacy of the underlying individuals. The principle of microaggregation is to aggregate original database records into small groups prior to publication. Each group should contain at least k records to prevent disclosure of individual information, where k is a constant value preset by the data protector. Recently, microaggregation has been shown to be useful to achieve k-anonymity, in addition to it being a good masking method. Optimal microaggregation (with minimum within-groups variability loss) can be computed in polynomial time for univariate data. Unfortunately, for multivariate data it is an NP-hard problem. Several heuristic approaches to microaggregation have been proposed in the literature. Heuristics yielding groups with fixed size k tends to be more efficient, whereas data-oriented heuristics yielding variable group size tends to result in lower information loss. This paper presents new data-oriented heuristics which improve on the trade-off between computational complexity and information loss and are thus usable for large datasets.