On-the-fly hierarchies for numerical attributes in data anonymization

  • Authors:
  • Alina Campan;Nicholas Cooper

  • Affiliations:
  • Department of Computer Science, Northern Kentucky University, Highland Heights, KY;Department of Computer Science, Northern Kentucky University, Highland Heights, KY

  • Venue:
  • SDM'10 Proceedings of the 7th VLDB conference on Secure data management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present in this paper a method for dynamically creating hierarchies for quasi-identifier numerical attributes. The resulting hierarchies can be used for generalization in microdata k-anonymization, or for allowing users to define generalization boundaries for constrained k-anonymity. The construction of a new numerical hierarchy for a numerical attribute is performed as a hierarchical agglomerative clustering of that attribute's values in the dataset to anonymize. Therefore, the resulting tree hierarchy reflects well the closeness and clustering tendency of the attribute's values in the dataset. Due to this characteristic of the hierarchies created on-the-fly for quasi-identifier numerical attributes, the quality of the microdata anonymized through generalization based on these hierarchies is well preserved, and the information loss in the anonymization process remains in reasonable bounds, as proved experimentally.