Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Real and complex analysis, 3rd ed.
Real and complex analysis, 3rd ed.
Generalizing data to provide anonymity when disclosing information (abstract)
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximation algorithms
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Data Privacy through Optimal k-Anonymization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality
VLDB '05 Proceedings of the 31st international conference on Very large data bases
\ell -Diversity: Privacy Beyond \kappa -Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Selection and sorting with limited storage
SFCS '78 Proceedings of the 19th Annual Symposium on Foundations of Computer Science
ICDT'05 Proceedings of the 10th international conference on Database Theory
Toward privacy in public databases
TCC'05 Proceedings of the Second international conference on Theory of Cryptography
PinKDD'07: privacy, security, and trust in KDD post-workshop report
ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Online anonymity protection in computer-mediated communication
IEEE Transactions on Information Forensics and Security
ICTAC'12 Proceedings of the 9th international conference on Theoretical Aspects of Computing
"Better than nothing" privacy with bloom filters: to what extent?
PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Negotiation-based privacy preservation scheme in internet of things platform
Proceedings of the First International Conference on Security of Internet of Things
Hi-index | 0.00 |
In this age of globalization, organizations need to publish their microdata owing to legal directives or share it with business associates in order to remain competitive. This puts personal privacy at risk. To surmount this risk, attributes that clearly identify individuals, such as Name, Social Security Number, and Driving License Number, are generally removed or replaced by random values. But this may not be enough because such de-identified databases can sometimes be joined with other public databases on attributes such as Gender, Date of Birth, and Zipcode to re-identify individuals who were supposed to remain anonymous. In the literature, such an identity-leaking attribute combination is called as a quasi-identifier. It is always critical to be able to recognize quasi-identifiers and to apply to them appropriate protective measures to mitigate the identity disclosure risk posed by join attacks. In this paper, we start out by providing the first formal characterization and a practical technique to identify quasi-identifiers. We show an interesting connection between whether a set of columns forms a quasi-identifier and the number of distinct values assumed by the combination of the columns.We then use this characterization to come up with a probabilistic notion of anonymity. Again we show an interesting connection between the number of distinct values taken by a combination of columns and the anonymity it can offer. This allows us to find an ideal amount of generalization or suppression to apply to different columns in order to achieve probabilistic anonymity. We work through many examples and show that our analysis can be used to make a published database conform to privacy rules like HIPAA. In order to achieve probabilistic anonymity, we observe that one needs to solve multiple 1-dimensional k-anonymity problems. We propose many efficient and scalable algorithms for achieving 1-dimensional anonymity. Our algorithms are optimal in a sense that they minimally distort data and retain much of its utility.