Privacy-preserving data mining through knowledge model sharing

Authors:
Patrick Sharkey;Hongwei Tian;Weining Zhang;Shouhuai Xu
Affiliations:
Department of Computer Science, University of Texas at San Antonio;Department of Computer Science, University of Texas at San Antonio;Department of Computer Science, University of Texas at San Antonio;Department of Computer Science, University of Texas at San Antonio
Venue:
PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Year:
2007

Citing 20
Cited 5

Combination of Multiple Classifiers Using Local Accuracy Estimates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Cryptographic techniques for privacy-preserving data mining

ACM SIGKDD Explorations Newsletter
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A Framework for High-Accuracy Privacy-Preserving Mining

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Knowledge as a Service and Knowledge Breaching

SCC '05 Proceedings of the 2005 IEEE International Conference on Services Computing - Volume 01
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Personalized privacy preservation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Privacy-preserving decision tree mining based on random substitutions

ETRICS'06 Proceedings of the 2006 international conference on Emerging Trends in Information and Communication Security

PinKDD'07: privacy, security, and trust in KDD post-workshop report

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Extending l-diversity to generalize sensitive data

Data & Knowledge Engineering
Utility-guided Clustering-based Transaction Data Anonymization

Transactions on Data Privacy
Information based data anonymization for classification utility

Data & Knowledge Engineering
A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining

Transactions on Data Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Privacy-preserving data mining (PPDM) is an important topic to both industry and academia. In general there are two approaches to tackling PPDM, one is statistics-based and the other is crypto-based. The statistics-based approach has the advantage of being efficient enough to deal with large volume of datasets. The basic idea underlying this approach is to let the data owners publish some sanitized versions of their data (e.g., via perturbation, generalization, or l-diversification), which are then used for extracting useful knowledge models such as decision trees. In this paper, we present a new method for statistics-based PPDM. Our method differs from the existing ones because it lets the data owners share with each other the knowledge models extracted from their own private datasets, rather than to let the data owners publish any of their own private datasets (not even in any sanitized form). The knowledge models derived from the individual datasets are used to generate some pseudo-data that are then used for extracting the desired "global" knowledge models. While instrumental, there are some technical subtleties that need be carefully addressed. Specifically, we propose an algorithm for generating pseudo-data according to paths of a decision tree, a method for adapting anonymity measures of datasets to measure the privacy of decision trees, and an algorithm that prunes a decision tree to satisfy a given anonymity requirement. Through an empirical study, we show that predictive models learned using our method are significantly more accurate than those learned using the existing l-diversity method in both centralized and distributed environments with different types of datasets, predictive models, and utility measures.