Privacy-preserving sequential pattern release

Authors:
Huidong Jin;Jie Chen;Hongxing He;Christine M. O'Keefe
Affiliations:
CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra, ACT, Australia;CSIRO Preventative Health National Research Flagship, Canberra, ACT, Australia
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 7
Cited 1

k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
When do data mining results violate privacy?

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-Preserving Data Mining: Why, How, and When

IEEE Security and Privacy
Blocking Anonymity Threats Raised by Frequent Itemset Mining

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining unexpected associations for signalling potential adverse drug reactions from administrative health databases

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Practical issues on privacy-preserving health data mining

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate situations where releasing frequent sequential patterns can compromise individual's privacy. We propose two concrete objectives for privacy protection: k-anonymity and α-dissociation. The first addresses the problem of inferring patterns with very low support, say, in [1, k]. These inferred patterns can become quasi-identifiers in linking attacks. We show that, for all but one definition of support, it is impossible to reliably infer support values for patterns with two or more negative items (items which do not occur in a pattern) solely based on frequent sequential patterns. For the remaining definition, we formulate privacy inference channels. α-dissociation handles the problem of high certainty of inferring sensitive attribute values. In order to remove privacy threats w.r.t. the two objectives, we show that we only need to examine pairs of sequential patterns with length difference of 1. We then establish a Privacy Inference Channels Sanitisation (PICS) algorithm. It can, as illustrated by experiments, reduce the privacy disclosure risk carried by frequent sequential patterns with a small computation overhead.