Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

Authors:
Xiao-Bai Li;Sumit Sarkar
Affiliations:
Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854;School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Venue:
Operations Research
Year:
2009

Citing 27
Cited 3

Practical data-swapping: the first steps

ACM Transactions on Database Systems (TODS)
Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The statistical security of a statistical database

ACM Transactions on Database Systems (TODS)
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Knowledge Discovery in Personal Data vs. Privacy: A mini-symposium

IEEE Expert: Intelligent Systems and Their Applications
A Dynamic Programming Based Pruning Method for Decision Trees

INFORMS Journal on Computing
Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases

Operations Research
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat

Management Science
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

INFORMS Journal on Computing
Genetically Engineered Decision Trees: Population Diversity Produces Smarter Trees

Operations Research
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Impacts of user privacy preferences on personalized systems: a comparative study

Designing personalized user experiences in eCommerce
Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

IEEE Transactions on Knowledge and Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Diversification for better classification trees

Computers and Operations Research
A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining

IEEE Transactions on Knowledge and Data Engineering
Post-pruning in decision tree induction using multiple performance measures

Computers and Operations Research
Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data

Information Systems Research
A Unified Mathematical Programming Framework for Different Statistical Disclosure Limitation Methods

Operations Research
Anonymizing Classification Data for Privacy Preservation

IEEE Transactions on Knowledge and Data Engineering
Inference Controls for Statistical Databases

Computer
Stochastic Protection of Confidential Information in Databases: A Hybrid of Data Perturbation and Query Restriction

Operations Research

An improved EDP algorithm to privacy protection in data mining

BI'11 Proceedings of the 2011 international conference on Brain informatics
Reconstruction attack through classifier analysis

DBSec'12 Proceedings of the 26th Annual IFIP WG 11.3 conference on Data and Applications Security and Privacy
Developing privacy solutions for sharing and analysing healthcare data

International Journal of Business Information Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data-mining techniques can be used not only to study collective behavior about customers, but also to discover private information about individuals. In this study, we demonstrate that decision trees, a popular classification technique for data mining, can be used to effectively reveal individuals' confidential data, even when the identities of the individuals are not present in the data. We propose a novel approach for organizations to protect confidential data from such a classification attack. The key components of this approach include a set of entropy-based measures to evaluate disclosure risks of individual records, an optimal pruning algorithm to identify high-risk records, and a pair of data-swapping procedures to reduce the disclosure risks. The proposed method provides the best trade-off between data utility and privacy protection against classification attacks. It can be applied to data with both numeric and categorical attributes. An experimental study on six real-world data sets shows that the proposed method is very effective in protecting privacy while enabling legitimate data mining and analysis.