A data distortion by probability distribution
ACM Transactions on Database Systems (TODS)
Security-control methods for statistical databases: a comparative study
ACM Computing Surveys (CSUR)
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Practical Data-Oriented Microaggregation for Statistical Disclosure Control
IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Information preserving statistical obfuscation
Statistics and Computing
Solving the Cell Suppression Problem on Tabular Data with Linear Constraints
Management Science
Impacts of user privacy preferences on personalized systems: a comparative study
Designing personalized user experiences in eCommerce
Minimum Spanning Tree Partitioning Algorithm for Microaggregation
IEEE Transactions on Knowledge and Data Engineering
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation
Data Mining and Knowledge Discovery
\ell -Diversity: Privacy Beyond \kappa -Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining
IEEE Transactions on Knowledge and Data Engineering
Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data
Information Systems Research
Fast data anonymization with low information loss
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A polynomial-time approximation to optimal multivariate microaggregation
Computers & Mathematics with Applications
Privacy-Preserving Data Mining: Models and Algorithms
Privacy-Preserving Data Mining: Models and Algorithms
Perturbation of Numerical Confidential Data via Skew-t Distributions
Management Science
Hybrid microdata using microaggregation
Information Sciences: an International Journal
Multiplicative noise protocols
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
From t-Closeness-Like Privacy to Postrandomization via Information Theory
IEEE Transactions on Knowledge and Data Engineering
Divergence measures based on the Shannon entropy
IEEE Transactions on Information Theory
Pricing and disseminating customer data with privacy awareness
Decision Support Systems
Hi-index | 0.01 |
The extensive use of information technologies by organizations to collect and share personal data has raised strong privacy concerns. To respond to the public's demand for data privacy, a class of clustering-based data masking techniques is increasingly being used for privacy-preserving data sharing and analytics. Although they address reidentification risks, traditional clustering-based approaches for masking numeric attributes typically do not consider the disclosure risk of categorical confidential attributes. We propose a new approach to deal with this problem. The proposed method clusters data such that the data points within a group are similar in the nonconfidential attribute values, whereas the confidential attribute values within a group are well distributed. To accomplish this, the clustering method, which is based on a minimum spanning tree MST technique, uses two risk-utility trade-off measures in the growing and pruning stages of the MST technique, respectively. As part of our approach we also propose a novel cluster-level microperturbation method for masking data that overcomes a common problem of traditional clustering-based methods for data masking, which is their inability to preserve important statistical properties such as the variance of attributes and the covariance across attributes. We show that the mean vector and the covariance matrix of the masked data generated using the microperturbation method are unbiased estimates of the original mean vector and covariance matrix. An experimental study on several real-world data sets demonstrates the effectiveness of the proposed approach. This paper was accepted by Sandra Slaughter, information systems.