Data anonymization using an improved utility measurement

  • Authors:
  • Stuart Morton;Malika Mahoui;P. Joseph Gibson

  • Affiliations:
  • IUPUI, Indianapolis, IN, USA;IUPUI, Indianapolis, IN, USA;Marion County Public Health Department, Indianapolis, IN, USA

  • Venue:
  • Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As medical data continues to transition to an electronic format, opportunities arise for researchers to use this microdata to discover patterns and increase knowledge in order to improve patient care. Now more than ever, it is critical to protect the identities of the patients contained in these databases. Even after removing obvious "identifier" attributes, such as social security numbers or first and last names, that clearly identify a specific person, it is possible to join "quasi-identifier" attributes from two or more publicly available databases to identify individuals. K-anonymity is an established approach that has been used to ensure that no one individual can be distinguished within a group of at least k individuals. The majority of the proposed approaches implementing k-anonymity have focused on improving the efficiency of algorithms implementing k-anonymity; less emphasis has been put towards ensuring the "utility" of anonymized data from a researchers' perspective. We propose a data utility measurement, called the research value (RV), which evaluates how well common cutoffs for numerical data or groupings in categorical data are preserved during the anonymization process. The proposed algorithm utilizing the new utility function scales efficiently when the number of attributes is large, while still ensuring that the generalization process is dictated by the data content expert's assessment of the utility of the generalized data.