New Record Ordering Heuristics for Multivariate Microaggregation.

  • Authors:
  • Brook Heaton

  • Affiliations:
  • -

  • Venue:
  • New Record Ordering Heuristics for Multivariate Microaggregation.
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microaggregation is a method of statistical disclosure control that attempts to reconcile the need to release information to researchers with the need to protect privacy of individual records in a dataset. Under microaggregation, records are divided into groups containing at least k members. Actual data values of the members are replaced by the mean value of the group, such that each record in the group is indistinguishable from at least k-1 other records. The goal of microaggregation is to create groups of similar records such that information loss is minimized, where information loss is the sum squared deviation between the actual data values and the group means. Optimal multivariate microaggregation is an NP-hard problem, and heuristics have been proposed to generate solutions in reasonable running time. New heuristics are desirable for either producing groups with lower information loss, or for producing groups with similar information loss but lower computational complexity. Some of the best performing existing microaggregation heuristics are based on record ordering, since it has been proven that for a given ordering of records, the optimal set of groups for that particular ordering can be efficiently computed. This dissertation improves on previous heuristics that order records in a dataset and subsequently use this record ordering to generate high quality microaggregated k-partitions. This was accomplished by using heuristics from the traveling salesman problem (TSP) literature in order to more effectively order the records. In particular, two tour construction heuristics---the Greedy heuristic and the Quick Boruvka heuristic---that are comparable in complexity to extant microaggregation methods were investigated. Next, three tour improvement heuristics---2-opt, 3-opt, and Lin-Kernighan---were used on the tours constructed to investigate whether further reduction in information loss could be achieved. The tour improvement heuristics---particularly the 3-opt and Lin-Kernighan heuristics---provided microaggregation solutions better than the best previous known solutions across several datasets and values of k.