On the Approximability of Geometric and Geographic Generalization and the Min-Max Bin Covering Problem

  • Authors:
  • Wenliang Du;David Eppstein;Michael T. Goodrich;George S. Lueker

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, 13244;Dept. of Computer Science, Univ. of California, Irvine, 92697-3435;Dept. of Computer Science, Univ. of California, Irvine, 92697-3435;Dept. of Computer Science, Univ. of California, Irvine, 92697-3435

  • Venue:
  • WADS '09 Proceedings of the 11th International Symposium on Algorithms and Data Structures
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of abstracting a table of data about individuals so that no selection query can identify fewer than k individuals. We show that it is impossible to achieve arbitrarily good polynomial-time approximations for a number of natural variations of the generalization technique, unless P = NP , even when the table has only a single quasi-identifying attribute that represents a geographic or unordered attribute: - Zip-codes : nodes of a planar graph generalized into connected subgraphs - GPS coordinates : points in R2 generalized into non-overlapping rectangles - Unordered data : text labels that can be grouped arbitrarily. These hard single-attribute instances of generalization problems contrast with the previously known NP-hard instances, which require the number of attributes to be proportional to the number of individual records (the rows of the table). In addition to impossibility results, we provide approximation algorithms for these difficult single-attribute generalization problems, which, of course, apply to multiple-attribute instances with one that is quasi-identifying. Incidentally, the generalization problem for unordered data can be viewed as a novel type of bin packing problem---min-max bin covering ---which may be of independent interest.