A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining

Authors:
Hongwei Tian;Weining Zhang;Shouhuai Xu;Patrick Sharkey
Affiliations:
Department of Computer Science, University of Texas at San Antonio. e-mail: htian@cs.utsa.edu;Department of Computer Science, University of Texas at San Antonio. e-mail: wzhang@cs.utsa.edu;Department of Computer Science, University of Texas at San Antonio. e-mail: shxu@cs.utsa.edu;Department of Computer Science, University of Texas at San Antonio. e-mail: psharkey@cs.utsa.edu
Venue:
Transactions on Data Privacy
Year:
2012

Citing 61
Cited 0

Bagging predictors

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
Induction of Decision Trees

Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Privacy Preserving Data Mining

CRYPTO '00 Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Cryptographic techniques for privacy-preserving data mining

ACM SIGKDD Explorations Newsletter
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Assuring privacy when big brother is watching

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
When do data mining results violate privacy?

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A Framework for High-Accuracy Privacy-Preserving Mining

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Generalized multidimensional data mapping and query processing

ACM Transactions on Database Systems (TODS)
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Personalized privacy preservation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Workload-aware anonymization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Thoughts on k-anonymization

Data & Knowledge Engineering
Privacy preserving clustering on horizontally partitioned data

Data & Knowledge Engineering
Minimality attack in privacy preserving data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Fast data anonymization with low information loss

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Privacy skyline: privacy with multidimensional adversarial knowledge

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anonymity for continuous data publishing

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Preservation of proximity privacy in publishing numerical sensitive data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Providing k-anonymity in data mining

The VLDB Journal — The International Journal on Very Large Data Bases
Generalization-Based Privacy-Preserving Data Collection

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Privacy: Theory meets Practice on the Map

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Extending l-Diversity for Better Data Anonymization

ITNG '09 Proceedings of the 2009 Sixth International Conference on Information Technology: New Generations
Publishing naive Bayesian classifiers: privacy without accuracy loss

Proceedings of the VLDB Endowment
Privacy-preserving data mining through knowledge model sharing

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
The price of privately releasing contingency tables and the spectra of random matrices with correlated rows

Proceedings of the forty-second ACM symposium on Theory of computing
Clustering with diversity

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Privacy-preserving decision tree mining based on random substitutions

ETRICS'06 Proceedings of the 2006 international conference on Emerging Trends in Information and Communication Security
COAT: COnstraint-based anonymization of transactions

Knowledge and Information Systems - Special Issue on "Context-Aware Data Mining (CADM)"
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
On link privacy in randomizing social networks

Knowledge and Information Systems - Special Issue on Data Warehousing and Knowledge Discovery from Sensors and Streams
Merging local patterns using an evolutionary approach

Knowledge and Information Systems
Privacy-preserving hybrid collaborative filtering on cross distributed data

Knowledge and Information Systems
Privacy preserving clustering

ESORICS'05 Proceedings of the 10th European conference on Research in Computer Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Privacy-preserving data mining (PPDM) is an important problem and is currently studied in three approaches: the cryptographic approach, the data publishing, and the model publishing. However, each of these approaches has some problems. The cryptographic approach does not protect privacy of learned knowledge models and may have performance and scalability issues. The data publishing, although is popular, may suffer from too much utility loss for certain types of data mining applications. The model publishing is lacking of efficient algorithms for practical use in a multiple data source environment. In this paper, we present a knowledge model sharing based approach which learns a global knowledge model from pseudo-data generated according to anonymized knowledge models published by local data sources. Specifically, for the anonymization of knowledge models, we present two privacy measures for decision trees and an algorithm that obtains an anonymized decision tree by tree pruning. For the pseudo-data generation, we present an algorithm that generates useful pseudo-data from decision trees. We empirically study our method by comparing it with several PPDM methods that utilize existing techniques, including three methods that publish anonymized-data, one method that learns anonymized decision trees directly from the original-data, and one method that uses ensemble classification. Our results show that in both single data source and multiple data source environments and for several different datasets, predictive models, and utility measures, our method can obtain significantly better predictive models (especially, decision trees) than the other methods.