A privacy protection technique for publishing data mining models and research data

  • Authors:
  • Yu Fu;Zhiyuan Chen;Gunes Koru;Aryya Gangopadhyay

  • Affiliations:
  • University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD

  • Venue:
  • ACM Transactions on Management Information Systems (TMIS)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining techniques have been widely used in many research disciplines such as medicine, life sciences, and social sciences to extract useful knowledge (such as mining models) from research data. Research data often needs to be published along with the data mining model for verification or reanalysis. However, the privacy of the published data needs to be protected because otherwise the published data is subject to misuse such as linking attacks. Therefore, employing various privacy protection methods becomes necessary. However, these methods only consider privacy protection and do not guarantee that the same mining models can be built from sanitized data. Thus the published models cannot be verified using the sanitized data. This article proposes a technique that not only protects privacy, but also guarantees that the same model, in the form of decision trees or regression trees, can be built from the sanitized data. We have also experimentally shown that other mining techniques can be used to reanalyze the sanitized data. This technique can be used to promote sharing of research data.