Mining frequent patterns with differential privacy

  • Authors:
  • Luca Bonomi;Li Xiong

  • Affiliations:
  • Department of Mathematics & Computer Science, Emory University, Atlanta;Department of Mathematics & Computer Science, Emory University, Atlanta

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The mining of frequent patterns is a fundamental component in many data mining tasks. A considerable amount of research on this problem has led to a wide series of efficient and scalable algorithms for mining frequent patterns. However, releasing these patterns is posing concerns on the privacy of the users participating in the data. Indeed the information from the patterns can be linked with a large amount of data available from other sources creating opportunities for adversaries to break the individual privacy of the users and disclose sensitive information. In this proposal, we study the mining of frequent patterns in a privacy preserving setting. We first investigate the difference between sequential and itemset patterns, and second we extend the definition of patterns by considering the absence and presence of noise in the data. This leads us in distinguishing the patterns between exact and noisy. For exact patterns, we describe two novel mining techniques that we previously developed. The first approach has been applied in a privacy preserving record linkage setting, where our solution is used to mine frequent patterns which are employed in a secure transformation procedure to link records that are similar. The second approach improves the mining utility results using a two-phase strategy which allows to effectively mine frequent substrings as well as prefixes patterns. For noisy patterns, first we formally define the patterns according to the type of noise and second we provide a set of potential applications that require the mining of these patterns. We conclude the paper by stating the challenges in this new setting and possible future research directions.