Probabilistic anonymity

Authors:
Sachin Lodha;Dilys Thomas
Affiliations:
Tata Research Development and Design Centre, Pune, India;Google Inc., Mountain View
Venue:
PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Year:
2007

Citing 20
Cited 5

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Real and complex analysis, 3rd ed.

Real and complex analysis, 3rd ed.
Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximation algorithms

Approximation algorithms
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Selection and sorting with limited storage

SFCS '78 Proceedings of the 19th Annual Symposium on Foundations of Computer Science
Anonymizing tables

ICDT'05 Proceedings of the 10th international conference on Database Theory
Toward privacy in public databases

TCC'05 Proceedings of the Second international conference on Theory of Cryptography

PinKDD'07: privacy, security, and trust in KDD post-workshop report

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Online anonymity protection in computer-mediated communication

IEEE Transactions on Information Forensics and Security
Data privacy using MASKETEER™

ICTAC'12 Proceedings of the 9th international conference on Theoretical Aspects of Computing
"Better than nothing" privacy with bloom filters: to what extent?

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Negotiation-based privacy preservation scheme in internet of things platform

Proceedings of the First International Conference on Security of Internet of Things

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this age of globalization, organizations need to publish their microdata owing to legal directives or share it with business associates in order to remain competitive. This puts personal privacy at risk. To surmount this risk, attributes that clearly identify individuals, such as Name, Social Security Number, and Driving License Number, are generally removed or replaced by random values. But this may not be enough because such de-identified databases can sometimes be joined with other public databases on attributes such as Gender, Date of Birth, and Zipcode to re-identify individuals who were supposed to remain anonymous. In the literature, such an identity-leaking attribute combination is called as a quasi-identifier. It is always critical to be able to recognize quasi-identifiers and to apply to them appropriate protective measures to mitigate the identity disclosure risk posed by join attacks. In this paper, we start out by providing the first formal characterization and a practical technique to identify quasi-identifiers. We show an interesting connection between whether a set of columns forms a quasi-identifier and the number of distinct values assumed by the combination of the columns.We then use this characterization to come up with a probabilistic notion of anonymity. Again we show an interesting connection between the number of distinct values taken by a combination of columns and the anonymity it can offer. This allows us to find an ideal amount of generalization or suppression to apply to different columns in order to achieve probabilistic anonymity. We work through many examples and show that our analysis can be used to make a published database conform to privacy rules like HIPAA. In order to achieve probabilistic anonymity, we observe that one needs to solve multiple 1-dimensional k-anonymity problems. We propose many efficient and scalable algorithms for achieving 1-dimensional anonymity. Our algorithms are optimal in a sense that they minimally distort data and retain much of its utility.