A modification of the Lloyd algorithm for k-anonymous quantization

Authors:
David Rebollo-Monedero;Jordi Forné;Esteve PallarèS;Javier Parra-Arnau
Affiliations:
Dept. of Telematics Engineering, Universitat Politécnica de Catalunya, C. Jordi Girona 1-3, E-08034 Barcelona, Spain;Dept. of Telematics Engineering, Universitat Politécnica de Catalunya, C. Jordi Girona 1-3, E-08034 Barcelona, Spain;Dept. of Telematics Engineering, Universitat Politécnica de Catalunya, C. Jordi Girona 1-3, E-08034 Barcelona, Spain;Dept. of Telematics Engineering, Universitat Politécnica de Catalunya, C. Jordi Girona 1-3, E-08034 Barcelona, Spain
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 26
Cited 2

Computer Methods for Mathematical Computations

Computer Methods for Mathematical Computations
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
Minimum Spanning Tree Partitioning Algorithm for Microaggregation

IEEE Transactions on Knowledge and Data Engineering
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Data Mining and Knowledge Discovery
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Privacy Protection: p-Sensitive k-Anonymity Property

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Efficient multivariate data-oriented microaggregation

The VLDB Journal — The International Journal on Very Large Data Bases
TFRP: An efficient microaggregation algorithm for statistical disclosure control

Journal of Systems and Software
A polynomial-time approximation to optimal multivariate microaggregation

Computers & Mathematics with Applications
A Critique of k-Anonymity and Some of Its Enhancements

ARES '08 Proceedings of the 2008 Third International Conference on Availability, Reliability and Security
The cost of privacy: destruction of data-mining utility in anonymized data publishing

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
From t-Closeness to PRAM and Noise Addition Via Information Theory

PSD '08 Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical Databases
An Improved V-MDAV Algorithm for l-Diversity

ISIP '08 Proceedings of the 2008 International Symposiums on Information Processing
On the disclosure risk of multivariate microaggregation

Data & Knowledge Engineering
Enhanced P-Sensitive K-Anonymity Models for Privacy Preserving Data Publishing

Transactions on Data Privacy
Statistical Disclosure Control for Microdata Using the R-Package sdcMicro

Transactions on Data Privacy
k-Anonymous data collection

Information Sciences: an International Journal
Density-based microaggregation for statistical disclosure control

Expert Systems with Applications: An International Journal
Privacy-preserving data mining: A feature set partitioning approach

Information Sciences: an International Journal
Hybrid microdata using microaggregation

Information Sciences: an International Journal
From t-Closeness-Like Privacy to Postrandomization via Information Theory

IEEE Transactions on Knowledge and Data Engineering
CASTLE: Continuously Anonymizing Data Streams

IEEE Transactions on Dependable and Secure Computing
CASTLE: Continuously Anonymizing Data Streams

IEEE Transactions on Dependable and Secure Computing
Quantization

IEEE Transactions on Information Theory
Least squares quantization in PCM

IEEE Transactions on Information Theory

Clustering construction on a multimodal probability model

Information Sciences: an International Journal
DocCloud: A document recommender system on cloud computing with plausible deniability

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

We address the problem of designing quantizers that cluster data while satisfying a k-anonymity requirement. A general data compression perspective is adopted, which considers both discrete and continuous probability distributions, and corresponding constraints on both cell sizes and quantizer index probabilities. Potential applications of this problem extend well beyond the important case of microdata anonymization, to include also optimized task allocation under workload constraints. Our contribution is twofold. First and most importantly, we present a theoretical analysis showing the optimality conditions which probability-constrained quantizers must satisfy, thereby theoretically characterizing optimal k-anonymous aggregation as a special case. As a second contribution, inspired by our theoretical analysis, we propose an alternating optimization algorithm for the design of this type of quantizers. Our algorithm is conceptually motivated by the popular Lloyd-Max algorithm for quantization design, originally intended for data compression, also known as the k-means method. Experimental results for synthetic and real data, with mean squared error as a distortion measure, confirm that our method outperforms MDAV, a popular fixed-size microaggregation algorithm for statistical disclosure control. This performance improvement is in terms of data utility, for the exact same k-anonymity constraint, but does come at the expense of higher computational sophistication.