A General Additive Data Perturbation Method for Database Security
Management Science
Information preserving statistical obfuscation
Statistics and Computing
A theoretical basis for perturbation methods
Statistics and Computing
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Perturbation of Numerical Confidential Data via Skew-t Distributions
Management Science
Hybrid microdata using microaggregation
Information Sciences: an International Journal
Hybrid microdata via model-based clustering
PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Disclosure Control of Confidential Data by Applying Pac Learning Theory
Journal of Database Management
Hi-index | 0.00 |
The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. Many type of statistical analyses used in practice rely on the assumption of multivariate normality (Gaussian model). For these analyses, maintaining the mean vector and covari-ance matrix of the masked data to be the same as that of the original data implies that if the masked data is analyzed using these techniques, the results of such analysis will be the same as that using the original data. For numerical confidential data, a recently proposed perturbation method makes it possi-ble to maintain the mean vector and covariance matrix of the masked data to be exactly the same as the original data. However, as it is currently proposed, the perturbed values from this method are consid-ered synthetic because they are generated without considering the values of the confidential variables (and are based only on the non-confidential variables). Some researchers argue that synthetic data re-sults in information loss. In this study, we provide a new methodology for generating non-synthetic perturbed data that maintains the mean vector and covariance matrix of the masked data to be exactly the same as the original data while offering a selectable degree of similarity between original and per-turbed data.