Attribute selection in multivariate microaggregation

Authors:
Jordi Nin;Javier Herranz;Vicenç Torra
Affiliations:
IIIA, Artificial Intelligence Research Institute, CSIC, Bellaterra, Catalonia, Spain;IIIA, Artificial Intelligence Research Institute, CSIC, Bellaterra, Catalonia, Spain;IIIA, Artificial Intelligence Research Institute, CSIC, Bellaterra, Catalonia, Spain
Venue:
PAIS '08 Proceedings of the 2008 international workshop on Privacy and anonymity in information society
Year:
2008

Citing 8
Cited 2

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Disclosure Risk Assessment in Perturbative Microdata Protection

Inference Control in Statistical Databases, From Theory to Practice
A Polynomial Algorithm for Optimal Univariate Microaggregation

IEEE Transactions on Knowledge and Data Engineering
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient multivariate data-oriented microaggregation

The VLDB Journal — The International Journal on Very Large Data Bases
Rethinking rank swapping to decrease disclosure risk

Data & Knowledge Engineering

Improving Microaggregation for Complex Record Anonymization

MDAI '08 Sabadell Proceedings of the 5th International Conference on Modeling Decisions for Artificial Intelligence
Report on international workshop on privacy and anonymity in the information society (PAIS 2008)

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microaggregation is one of the most employed microdata protection methods. The idea is to build clusters of at least k original records, and then replace them with the centroid of the cluster. When the number of attributes of the dataset is large, a common practice is to split the dataset into smaller blocks of attributes. Microaggregation is successively and independently applied to each block. In this way, the effect of the noise introduced by microaggregation is reduced, but at the cost of losing the k-anonymity property. The goal of this work is to show that, besides of the specific microaggregation method employed, the value of the parameter k, and the number of blocks in which the dataset is split, there exists another factor which can influence the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and, so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.