Privacy-preserving sequential pattern release

  • Authors:
  • Huidong Jin;Jie Chen;Hongxing He;Christine M. O'Keefe

  • Affiliations:
  • CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Preventative Health National Research Flagship, Canberra, ACT, Australia

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate situations where releasing frequent sequential patterns can compromise individual's privacy. We propose two concrete objectives for privacy protection: k-anonymity and α-dissociation. The first addresses the problem of inferring patterns with very low support, say, in [1, k]. These inferred patterns can become quasi-identifiers in linking attacks. We show that, for all but one definition of support, it is impossible to reliably infer support values for patterns with two or more negative items (items which do not occur in a pattern) solely based on frequent sequential patterns. For the remaining definition, we formulate privacy inference channels. α-dissociation handles the problem of high certainty of inferring sensitive attribute values. In order to remove privacy threats w.r.t. the two objectives, we show that we only need to examine pairs of sequential patterns with length difference of 1. We then establish a Privacy Inference Channels Sanitisation (PICS) algorithm. It can, as illustrated by experiments, reduce the privacy disclosure risk carried by frequent sequential patterns with a small computation overhead.