Privacy preservation by independent component analysis and variance control

Authors:
Chih-Ming Hsu;Ming-Syan Chen
Affiliations:
National Taiwan University, Taipei, Taiwan Roc;National Taiwan University & Academia Sinica, Taipei, Taiwan Roc
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 11
Cited 0

A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
Matrix analysis

Matrix analysis
Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Security of random data perturbation methods

ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique

Inference Control in Statistical Databases, From Theory to Practice
LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection

Inference Control in Statistical Databases, From Theory to Practice
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy Preserving Data Classification with Rotation Perturbation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Data ShufflingA New Masking Approach for Numerical Data

Management Science

Quantified Score

Hi-index	0.01

Visualization

Abstract

The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.