A distributed approach to enabling privacy-preserving model-based classifier training

Authors:
Hangzai Luo;Jianping Fan;Xiaodong Lin;Aoying Zhou;Elisa Bertino
Affiliations:
East China Normal University, Shanghai Key Lab of Trustworthy Computing, Shanghai, China;University of North Carolina, Department of Computer Science, 28223, Charlotte, NC, USA;University of Cincinnati, Department of Mathematical Sciences, 45221, Cincinnati, OH, USA;East China Normal University, Shanghai Key Lab of Trustworthy Computing, Shanghai, China;Purdue University, Department of Computer Science, 47907, West Lafayette, IN, USA
Venue:
Knowledge and Information Systems
Year:
2009

Citing 37
Cited 0

A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
How to play ANY mental game

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
New approximations of differential entropy for independent component analysis and projection pursuit

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Security of random data perturbation methods

ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Privacy Preserving Data Mining

CRYPTO '00 Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology
Data mining, national security, privacy and civil liberties

ACM SIGKDD Explorations Newsletter
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-Preserving Cooperative Statistical Analysis

ACSAC '01 Proceedings of the 17th Annual Computer Security Applications Conference
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Security of shared data in large systems: state of the art and research directions

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
When do data mining results violate privacy?

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving Bayesian network structure computation on distributed heterogeneous data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal randomization for privacy preserving data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A novel approach for privacy-preserving video sharing

Proceedings of the 14th ACM international conference on Information and knowledge management
Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

IEEE Transactions on Knowledge and Data Engineering
Privacy Preserving Data Classification with Rotation Perturbation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining multiple private databases using a kNN classifier

Proceedings of the 2007 ACM symposium on Applied computing
SMEM Algorithm for Mixture Models

Neural Computation
Vision paper: enabling privacy for the paranoids

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A privacy-preserving index for range queries

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Concept-based large-scale video database browsing and retrieval via visualization

Concept-based large-scale video database browsing and retrieval via visualization
How to generate and exchange secrets

SFCS '86 Proceedings of the 27th Annual Symposium on Foundations of Computer Science
Issues in stacked generalization

Journal of Artificial Intelligence Research
Privacy in database publishing

ICDT'05 Proceedings of the 10th international conference on Database Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel approach for privacy-preserving distributed model-based classifier training. Our approach is an important step towards supporting customizable privacy modeling and protection. It consists of three major steps. First, each data site independently learns a weak concept model (i.e., local classifier) for a given data pattern or concept by using its own training samples. An adaptive EM algorithm is proposed to select the model structure and estimate the model parameters simultaneously. The second step deals with combined classifier training by integrating the weak concept models that are shared from multiple data sites. To reduce the data transmission costs and the potential privacy breaches, only the weak concept models are sent to the central site and synthetic samples are directly generated from these shared weak concept models at the central site. Both the shared weak concept models and the synthetic samples are then incorporated to learn a reliable and complete global concept model. A computational approach is developed to automatically achieve a good trade off between the privacy disclosure risk, the sharing benefit and the data utility. The third step deals with validating the combined classifier by distributing the global concept model to all these data sites in the collaboration network while at the same time limiting the potential privacy breaches. Our approach has been validated through extensive experiments carried out on four UCI machine learning data sets and two image data sets.