A data distortion by probability distribution
ACM Transactions on Database Systems (TODS)
Completeness theorems for non-cryptographic fault-tolerant distributed computation
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Sample-based non-uniform random variate generation
WSC '86 Proceedings of the 18th conference on Winter simulation
A General Additive Data Perturbation Method for Database Security
Management Science
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques
Data mining: concepts and techniques
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cryptographic techniques for privacy-preserving data mining
ACM SIGKDD Explorations Newsletter
Randomization in privacy preserving data mining
ACM SIGKDD Explorations Newsletter
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Privacy preserving mining of association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Disclosure Limitation of Sensitive Rules
KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
On the Privacy Preserving Properties of Random Data Perturbation Techniques
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Deriving private information from randomized data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A privacy-sensitive approach to distributed clustering
Pattern Recognition Letters - Special issue: Advances in pattern recognition
A new scheme on privacy-preserving data classification
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
IEEE Transactions on Knowledge and Data Engineering
Privacy Preserving Data Classification with Rotation Perturbation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
How to generate and exchange secrets
SFCS '86 Proceedings of the 27th Annual Symposium on Foundations of Computer Science
Our data, ourselves: privacy via distributed noise generation
EUROCRYPT'06 Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques
Polylogarithmic private approximations and efficient matching
TCC'06 Proceedings of the Third conference on Theory of Cryptography
PinKDD'07: privacy, security, and trust in KDD post-workshop report
ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Cloud-enabled privacy-preserving collaborative learning for mobile sensing
Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems
Bands of privacy preserving objectives: classification of PPDM strategies
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Hi-index | 0.00 |
Data mining tasks such as supervised classification can often benefit from a large training dataset. However, in many application domains, privacy concerns can hinder the construction of an accurate classifier by combining datasets from multiple sites. In this work, we propose a novel privacy-preserving distributed data sanitization algorithm that randomizes the private data at each site independently before the data is pooled to form a classifier at a centralized site. Distance-preserving perturbation approaches have been proposed by other researchers but we show that they can be susceptible to security risks. To enhance security, we require a unique non-distance-preserving approach. We use Kernel Density Estimation (KDE) Resampling, where samples are drawn independently from a distribution that is approximately equal to the original data's distribution. KDE Resampling provides consistent density estimates with randomized samples that are asymptotically independent of the original samples. This ensures high accuracy, especially when a large number of samples is available, with low privacy loss. We evaluated our approach on five standard datasets in a distributed setting using three different classifiers. The classification errors only deteriorated by 3% (in the worst case) when we used the randomized data instead of the original private data. With a large number of samples, KDE Resampling effectively preserves privacy (due to the asymptotic independence property) and also maintains the necessary data integrity for constructing accurate classifiers (due to consistency).