Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

  • Authors:
  • Xiao-Bai Li;Sumit Sarkar

  • Affiliations:
  • Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854;School of Management, The University of Texas at Dallas, Richardson, Texas 75080

  • Venue:
  • Operations Research
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data-mining techniques can be used not only to study collective behavior about customers, but also to discover private information about individuals. In this study, we demonstrate that decision trees, a popular classification technique for data mining, can be used to effectively reveal individuals' confidential data, even when the identities of the individuals are not present in the data. We propose a novel approach for organizations to protect confidential data from such a classification attack. The key components of this approach include a set of entropy-based measures to evaluate disclosure risks of individual records, an optimal pruning algorithm to identify high-risk records, and a pair of data-swapping procedures to reduce the disclosure risks. The proposed method provides the best trade-off between data utility and privacy protection against classification attacks. It can be applied to data with both numeric and categorical attributes. An experimental study on six real-world data sets shows that the proposed method is very effective in protecting privacy while enabling legitimate data mining and analysis.