Practical data-swapping: the first steps
ACM Transactions on Database Systems (TODS)
Security-control methods for statistical databases: a comparative study
ACM Computing Surveys (CSUR)
C4.5: programs for machine learning
C4.5: programs for machine learning
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The statistical security of a statistical database
ACM Transactions on Database Systems (TODS)
Protecting Respondents' Identities in Microdata Release
IEEE Transactions on Knowledge and Data Engineering
Knowledge Discovery in Personal Data vs. Privacy: A mini-symposium
IEEE Expert: Intelligent Systems and Their Applications
A Dynamic Programming Based Pruning Method for Decision Trees
INFORMS Journal on Computing
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees
INFORMS Journal on Computing
IEEE Transactions on Knowledge and Data Engineering
Impacts of user privacy preferences on personalized systems: a comparative study
Designing personalized user experiences in eCommerce
IEEE Transactions on Knowledge and Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Diversification for better classification trees
Computers and Operations Research
A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining
IEEE Transactions on Knowledge and Data Engineering
Post-pruning in decision tree induction using multiple performance measures
Computers and Operations Research
Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data
Information Systems Research
Anonymizing Classification Data for Privacy Preservation
IEEE Transactions on Knowledge and Data Engineering
An improved EDP algorithm to privacy protection in data mining
BI'11 Proceedings of the 2011 international conference on Brain informatics
Reconstruction attack through classifier analysis
DBSec'12 Proceedings of the 26th Annual IFIP WG 11.3 conference on Data and Applications Security and Privacy
Developing privacy solutions for sharing and analysing healthcare data
International Journal of Business Information Systems
Hi-index | 0.01 |
Data-mining techniques can be used not only to study collective behavior about customers, but also to discover private information about individuals. In this study, we demonstrate that decision trees, a popular classification technique for data mining, can be used to effectively reveal individuals' confidential data, even when the identities of the individuals are not present in the data. We propose a novel approach for organizations to protect confidential data from such a classification attack. The key components of this approach include a set of entropy-based measures to evaluate disclosure risks of individual records, an optimal pruning algorithm to identify high-risk records, and a pair of data-swapping procedures to reduce the disclosure risks. The proposed method provides the best trade-off between data utility and privacy protection against classification attacks. It can be applied to data with both numeric and categorical attributes. An experimental study on six real-world data sets shows that the proposed method is very effective in protecting privacy while enabling legitimate data mining and analysis.