Optimal random perturbation at multiple privacy levels

  • Authors:
  • Xiaokui Xiao;Yufei Tao;Minghua Chen

  • Affiliations:
  • Nanyang Technological University, Singapore;Chinese University of Hong Kong, New Territories, Hong Kong;Chinese University of Hong Kong, New Territories, Hong Kong

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2009

Quantified Score

Hi-index 0.03

Visualization

Abstract

Random perturbation is a popular method of computing anonymized data for privacy preserving data mining. It is simple to apply, ensures strong privacy protection, and permits effective mining of a large variety of data patterns. However, all the existing studies with good privacy guarantees focus on perturbation at a single privacy level. Namely, a fixed degree of privacy protection is imposed on all anonymized data released by the data holder. This drawback seriously limits the applicability of random perturbation in scenarios where the holder has numerous recipients to which different privacy levels apply. Motivated by this, we study the problem of multi-level perturbation, whose objective is to release multiple versions of a dataset anonymized at different privacy levels. The challenge is that various recipients may collude by sharing their data to infer privacy beyond their permitted levels. Our solution overcomes this obstacle, and achieves two crucial properties. First, collusion is useless, meaning that the colluding recipients cannot learn anything more than what the most trustable recipient (among the colluding recipients) already knows alone. Second, the data each recipient receives can be regarded (and hence, analyzed in the same way) as the output of conventional uniform perturbation. Besides its solid theoretical foundation, the proposed technique is both space economical and computationally efficient. It requires O (n+m) expected space, and produces a new anonymized version in O (n + log m) expected time, where n is the cardinality of the original dataset, and m the number of versions released previously. Both bounds are optimal under the realistic assumption that n » m.