SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness

Authors:
Jianneng Cao;Panagiotis Karras;Panos Kalnis;Kian-Lee Tan
Affiliations:
School of Computing, National University of Singapore, Singapore, Republic of Singapore;School of Computing, National University of Singapore, Singapore, Republic of Singapore;Division of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia;School of Computing, National University of Singapore, Singapore, Republic of Singapore
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2011

Citing 30
Cited 5

Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Workload-aware anonymization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Maintaining K-Anonymity against Incremental Updates

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Fast data anonymization with low information loss

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anonymity for continuous data publishing

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Dynamic anonymization: accurate statistical analysis with privacy preservation

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Preservation of proximity privacy in publishing numerical sensitive data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A framework for efficient data anonymization under privacy and accuracy constraints

ACM Transactions on Database Systems (TODS)
CASTLE: A delay-constrained scheme for ks-anonymizing data streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Closeness: A New Privacy Measure for Data Publishing

IEEE Transactions on Knowledge and Data Engineering
From t-Closeness-Like Privacy to Postrandomization via Information Theory

IEEE Transactions on Knowledge and Data Engineering
CASTLE: Continuously Anonymizing Data Streams

IEEE Transactions on Dependable and Secure Computing
CASTLE: Continuously Anonymizing Data Streams

IEEE Transactions on Dependable and Secure Computing
Anonymizing tables

ICDT'05 Proceedings of the 10th international conference on Database Theory
Secure anonymization for incremental datasets

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management

Cloning for privacy protection in multiple independent data publications

Proceedings of the 20th ACM international conference on Information and knowledge management
Utility-driven anonymization in data publishing

Proceedings of the 20th ACM international conference on Information and knowledge management
Limiting disclosure of sensitive data in sequential releases of databases

Information Sciences: an International Journal
Publishing microdata with a robust privacy guarantee

Proceedings of the VLDB Endowment
Efficient tree pattern queries on encrypted XML documents

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, the publication of microdata poses a privacy threat: anonymous personal records can be re-identified using third data sources. Past research has tried to develop a concept of privacy guarantee that an anonymized data set should satisfy before publication, culminating in the notion of t-closeness. To satisfy t-closeness, the records in a data set need to be grouped into Equivalence Classes (ECs), such that each EC contains records of indistinguishable quasi-identifier values, and its local distribution of sensitive attribute (SA) values conforms to the global table distribution of SA values. However, despite this progress, previous research has not offered an anonymization algorithm tailored for t-closeness. In this paper, we cover this gap with SABRE, a SA Bucketization and REdistribution framework for t-closeness. SABRE first greedily partitions a table into buckets of similar SA values and then redistributes the tuples of each bucket into dynamically determined ECs. This approach is facilitated by a property of the Earth Mover's Distance (EMD) that we employ as a measure of distribution closeness: If the tuples in an EC are picked proportionally to the sizes of the buckets they hail from, then the EMD of that EC is tightly upper-bounded using localized upper bounds derived for each bucket. We prove that if the t-closeness constraint is properly obeyed during partitioning, then it is obeyed by the derived ECs too. We develop two instantiations of SABRE and extend it to a streaming environment. Our extensive experimental evaluation demonstrates that SABRE achieves information quality superior to schemes that merely applied algorithms tailored for other models to t-closeness, and can be much faster as well.