Privacy Preserving Market Basket Data Analysis

  • Authors:
  • Ling Guo;Songtao Guo;Xintao Wu

  • Affiliations:
  • University of North Carolina at Charlotte,;University of North Carolina at Charlotte,;University of North Carolina at Charlotte,

  • Venue:
  • PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Randomized Response techniques have been empirically investigated in privacy preserving association rule mining. However, previous research on privacy preserving market basket data analysis was solely focused on support/ confidence framework. Since there are inherent problems with the concept of finding rules based on their support and confidence measures, many other measures (e.g., correlation, lift, etc.) for the general market basket data analysis have been studied. How those measures are affected due to distortion is not clear in the privacy preserving analysis scenario.In this paper, we investigate the accuracy (in terms of bias and variance of estimates) of estimates of various rules derived from the randomized market basket data and present a general framework which can conduct theoretical analysis on how the randomization process affects the accuracy of various measures adopted in market basket data analysis. We also show several measures (e.g., correlation) have monotonic property, i.e., the values calculated directly from the randomized data are always less or equal than those original ones. Hence, some market basket data analysis tasks can be executed on the randomized data directly without the release of distortion probabilities, which can better protect data privacy.