Provable de-anonymization of large datasets with sparse dimensions

Authors:
Anupam Datta;Divya Sharma;Arunesh Sinha
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
POST'12 Proceedings of the First international conference on Principles of Security and Trust
Year:
2012

Citing 11
Cited 2

Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
You are what you say: privacy risks of public mentions

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Robust De-anonymization of Large Sparse Datasets

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Myths and fallacies of "Personally Identifiable Information"

Communications of the ACM
Differential privacy: a survey of results

TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
Quantitative information flow, with a view

ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II

On the feasibility of user de-anonymization from shared mobile sensor data

Proceedings of the Third International Workshop on Sensing Applications on Mobile Phones
On the use of decentralization to enable privacy in web-scale recommendation services

Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm that was used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of the database and the auxiliary information available to the adversary that enable two classes of privacy attacks. In the first attack, the adversary successfully identifies the individual about whom she possesses auxiliary information (an isolation attack). In the second attack, the adversary learns additional information about the individual, although she may not be able to uniquely identify him (an information amplification attack). We demonstrate the applicability of the analytical results by empirically verifying that the mathematical properties assumed of the database are actually true for a significant fraction of the records in the Netflix movie ratings database, which contains ratings from about 500,000 users.