TFRP: An efficient microaggregation algorithm for statistical disclosure control

Authors:
Chin-Chen Chang;Yu-Chiang Li;Wen-Hung Huang
Affiliations:
Department of Information Engineering and Computer Science, Feng Chia University, 100 Wenhwa Rd., Seatwen, Taichung 40724, Taiwan, ROC and Department of Computer Science and Information Engineerin ...;Department of Computer Science and Information Engineering, National Chung Cheng University, 168, University Rd., San-Hsing, Min-Hsiung, Chiayi 62102, Taiwan, ROC;Institute of Information Systems and Applications, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu 30013, Taiwan, ROC
Venue:
Journal of Systems and Software
Year:
2007

Citing 14
Cited 6

A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Data clustering: a review

ACM Computing Surveys (CSUR)
The statistical security of a statistical database

ACM Transactions on Database Systems (TODS)
Data Mining: Concepts, Models, Methods and Algorithms

Data Mining: Concepts, Models, Methods and Algorithms
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
A Polynomial Algorithm for Optimal Univariate Microaggregation

IEEE Transactions on Knowledge and Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Minimum Spanning Tree Partitioning Algorithm for Microaggregation

IEEE Transactions on Knowledge and Data Engineering
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Data Mining and Knowledge Discovery
A 2^d-Tree-Based Blocking Method for Microaggregating Very Large Data Sets

ARES '06 Proceedings of the First International Conference on Availability, Reliability and Security

Density-based microaggregation for statistical disclosure control

Expert Systems with Applications: An International Journal
Comparison of microaggregation approaches on anonymized data quality

Expert Systems with Applications: An International Journal
A modification of the Lloyd algorithm for k-anonymous quantization

Information Sciences: an International Journal
Optimal univariate microaggregation with data suppression

Journal of Systems and Software
MAGE: A semantics retaining K-anonymization method for mixed data

Knowledge-Based Systems
Multivariate microaggregation by iterative optimization

Applied Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recently, the issue of statistic disclosure control (SDC) has attracted much attention. SDC is a very important part of data security dealing with the protection of databases. Microaggregation for SDC techniques is widely used to protect confidentiality in statistical databases released for public use. The basic problem of microaggregation is that similar records are clustered into groups, and each group contains at least k records to prevent disclosure of individual information, where k is a pre-defined security threshold. For a certain k, an optimal multivariable microaggregation has the lowest information loss. The minimum information loss is an NP-hard problem. Existing fixed-size techniques can obtain a low information loss with O(n2) or O(n3/k) time complexity. To improve the execution time and lower information loss, this study proposes the Two Fixed Reference Points (TFRP) method, a two-phase algorithm for microaggregation. In the first phase, TFRP employs the pre-computing and median-of-medians techniques to efficiently shorten its running time to O(n2/k). To decrease information loss in the second phase, TFRP generates variable-size groups by removing the lower homogenous groups. Experimental results reveal that the proposed method is significantly faster than the Diameter and the Centroid methods. Running on several test datasets, TFRP also significantly reduces information loss, particularly in sparse datasets with a large k.