Information based data anonymization for classification utility

  • Authors:
  • Jiuyong Li;Jixue Liu;Muzammil Baig;Raymond Chi-Wing Wong

  • Affiliations:
  • School of Computer & Information Science, University of South Australia, Australia;School of Computer & Information Science, University of South Australia, Australia;School of Computer & Information Science, University of South Australia, Australia;Department of Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building classification models. In this paper, we argue that data generalization in anonymization should be determined by the classification capability of data rather than the privacy requirement. We make use of mutual information for measuring classification capability for generalization, and propose two k-anonymity algorithms to produce anonymized tables for building accurate classification models. The algorithms generalize attributes to maximize the classification capability, and then suppress values by a privacy requirement k (IACk) or distributional constraints (IACc). Experimental results show that algorithm IACk supports more accurate classification models and is faster than a benchmark utility-aware data anonymization algorithm.