A data distortion by probability distribution
ACM Transactions on Database Systems (TODS)
Matrix analysis
Security-control methods for statistical databases: a comparative study
ACM Computing Surveys (CSUR)
Security of random data perturbation methods
ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique
Inference Control in Statistical Databases, From Theory to Practice
LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection
Inference Control in Statistical Databases, From Theory to Practice
On the Privacy Preserving Properties of Random Data Perturbation Techniques
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Deriving private information from randomized data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy Preserving Data Classification with Rotation Perturbation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Data ShufflingA New Masking Approach for Numerical Data
Management Science
Hi-index | 0.01 |
The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.