Class-Restricted Clustering and Microperturbation for Data Privacy

Authors:
Xiao-Bai Li;Sumit Sarkar
Affiliations:
Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854;School of Management, University of Texas at Dallas, Richardson, Texas 75080
Venue:
Management Science
Year:
2013

Citing 26
Cited 1

A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators

Management Science
Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases

Operations Research
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Information preserving statistical obfuscation

Statistics and Computing
Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat

Management Science
Solving the Cell Suppression Problem on Tabular Data with Linear Constraints

Management Science
Impacts of user privacy preferences on personalized systems: a comparative study

Designing personalized user experiences in eCommerce
Minimum Spanning Tree Partitioning Algorithm for Microaggregation

IEEE Transactions on Knowledge and Data Engineering
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Data Mining and Knowledge Discovery
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining

IEEE Transactions on Knowledge and Data Engineering
A Data Disclosure Policy for Count Data Based on the COM-Poisson Distribution

Management Science
Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data

Information Systems Research
Fast data anonymization with low information loss

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A polynomial-time approximation to optimal multivariate microaggregation

Computers & Mathematics with Applications
Privacy-Preserving Data Mining: Models and Algorithms

Privacy-Preserving Data Mining: Models and Algorithms
Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information

Information Systems Research
Perturbation of Numerical Confidential Data via Skew-t Distributions

Management Science
Hybrid microdata using microaggregation

Information Sciences: an International Journal
Multiplicative noise protocols

PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
From t-Closeness-Like Privacy to Postrandomization via Information Theory

IEEE Transactions on Knowledge and Data Engineering
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory

Pricing and disseminating customer data with privacy awareness

Decision Support Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

The extensive use of information technologies by organizations to collect and share personal data has raised strong privacy concerns. To respond to the public's demand for data privacy, a class of clustering-based data masking techniques is increasingly being used for privacy-preserving data sharing and analytics. Although they address reidentification risks, traditional clustering-based approaches for masking numeric attributes typically do not consider the disclosure risk of categorical confidential attributes. We propose a new approach to deal with this problem. The proposed method clusters data such that the data points within a group are similar in the nonconfidential attribute values, whereas the confidential attribute values within a group are well distributed. To accomplish this, the clustering method, which is based on a minimum spanning tree MST technique, uses two risk-utility trade-off measures in the growing and pruning stages of the MST technique, respectively. As part of our approach we also propose a novel cluster-level microperturbation method for masking data that overcomes a common problem of traditional clustering-based methods for data masking, which is their inability to preserve important statistical properties such as the variance of attributes and the covariance across attributes. We show that the mean vector and the covariance matrix of the masked data generated using the microperturbation method are unbiased estimates of the original mean vector and covariance matrix. An experimental study on several real-world data sets demonstrates the effectiveness of the proposed approach. This paper was accepted by Sandra Slaughter, information systems.