Hybrid microdata using microaggregation

Authors:
Josep Domingo-Ferrer;Úrsula González-Nicolás
Affiliations:
Universitat Rovira i Virgili, Dept. of Computer Engineering and Mathematics, UNESCO Chair in Data Privacy, Av. Paısos Catalans 26, E-43007 Tarragona, Catalonia, Spain;Universitat Rovira i Virgili, Dept. of Computer Engineering and Mathematics, UNESCO Chair in Data Privacy, Av. Paısos Catalans 26, E-43007 Tarragona, Catalonia, Spain
Venue:
Information Sciences: an International Journal
Year:
2010

Citing 16
Cited 9

Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique

Inference Control in Statistical Databases, From Theory to Practice
LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection

Inference Control in Statistical Databases, From Theory to Practice
Disclosure Risk Assessment in Perturbative Microdata Protection

Inference Control in Statistical Databases, From Theory to Practice
Information preserving statistical obfuscation

Statistics and Computing
Analysis of Consensus Partition in Cluster Ensemble

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Minimum Spanning Tree Partitioning Algorithm for Microaggregation

IEEE Transactions on Knowledge and Data Engineering
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Data Mining and Knowledge Discovery
A polynomial-time approximation to optimal multivariate microaggregation

Computers & Mathematics with Applications
A measure of variance for hierarchical nominal attributes

Information Sciences: an International Journal
A Genetic Approach to Multivariate Microaggregation for Database Privacy

ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Generating Sufficiency-based Non-Synthetic Perturbed Data

Transactions on Data Privacy
Statistical Disclosure Control for Microdata Using the R-Package sdcMicro

Transactions on Data Privacy
Online data storage using implicit security

Information Sciences: an International Journal
Combinations of SDC methods for microdata protection

PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases

Privacy-preserving publication of trajectories using microaggregation

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS
Microaggregation- and permutation-based anonymization of movement data

Information Sciences: an International Journal
Privacy and utility for defect prediction: experiments with MORPH

Proceedings of the 34th International Conference on Software Engineering
A modification of the Lloyd algorithm for k-anonymous quantization

Information Sciences: an International Journal
Clustering-based categorical data protection

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Anonymization methods for taxonomic microdata

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Hybrid microdata via model-based clustering

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Class-Restricted Clustering and Microperturbation for Data Privacy

Management Science
Multivariate microaggregation by iterative optimization

Applied Intelligence

Quantified Score

Hi-index	0.07

Visualization

Abstract

Statistical disclosure control (also known as privacy-preserving data mining) of microdata is about releasing data sets containing the answers of individual respondents protected in such a way that: (i) the respondents corresponding to the released records cannot be re-identified; (ii) the released data stay analytically useful. Usually, the protected data set is generated by either masking (i.e. perturbing) the original data or by generating synthetic (i.e. simulated) data preserving some pre-selected statistics of the original data. Masked data may approximately preserve a broad range of distributional characteristics, although very few of them (if any) are exactly preserved; on the other hand, synthetic data exactly preserve the pre-selected statistics and may seem less disclosive than masked data, but they do not preserve at all any statistics other than those pre-selected. Hybrid data obtained by mixing the original data and synthetic data have been proposed in the literature to combine the strengths of masked and synthetic data. We show how to easily obtain hybrid data by combining microaggregation with any synthetic data generator. We show that numerical hybrid data exactly preserving means and covariances of original data and approximately preserving other statistics as well as some subdomain analyses can be obtained as a particular case with a very simple parameterization. The new method is competitive versus both the literature on hybrid data and plain multivariate microaggregation.