On-the-fly generalization hierarchies for numerical attributes revisited

  • Authors:
  • Alina Campan;Nicholas Cooper;Traian Marius Truta

  • Affiliations:
  • Department of Computer Science, Northern Kentucky University, Highland Heights, KY;Department of Computer Science, Northern Kentucky University, Highland Heights, KY;Department of Computer Science, Northern Kentucky University, Highland Heights, KY

  • Venue:
  • SDM'11 Proceedings of the 8th VLDB international conference on Secure data management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Generalization hierarchies are frequently used in computer science, statistics, biology, bioinformatics, and other areas when less specific values are needed for data analysis. Generalization is also one of the most used disclosure control technique for anonymizing data. For numerical attributes, generalization is performed either by using existing predefined generalization hierarchies or a hierarchy-free model. Because hierarchy-free generalization is not suitable for anonymization in all possible scenarios, generalization hierarchies are of particular interest for data anonymization. Traditionally, these hierarchies were created by the data owner with help from the domain experts. But while it is feasible to construct a hierarchy of small size, the effort increases for hierarchies that have many levels. Therefore, new approaches of creating these numerical hierarchies involve their automatic/on-the-fly generation. In this paper we extend an existing method for creating on-the-fly generalization hierarchies, we present several existing information loss measures used to assess the quality of anonymized data, and we run a series of experiments that show that our new method improves over existing methods to automatically generate on-the-fly numerical generalization hierarchies.