The cost of privacy: destruction of data-mining utility in anonymized data publishing

Authors:
Justin Brickell;Vitaly Shmatikov
Affiliations:
The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 35
Cited 33

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The statistical security of a statistical database

ACM Transactions on Database Systems (TODS)
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Revealing information while preserving privacy

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A formal analysis of information disclosure in data exchange

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Practical privacy: the SuLQ framework

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Template-Based Privacy Preservation in Classification Problems

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Privacy Protection: p-Sensitive k-Anonymity Property

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Thoughts on k-Anonymization

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Personalized privacy preservation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Workload-aware anonymization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Approximate algorithms for K-anonymity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Hiding the presence of individuals from shared databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Information disclosure under realistic assumptions: privacy versus optimality

Proceedings of the 14th ACM conference on Computer and communications security
The boundary between privacy and utility in data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Privacy skyline: privacy with multidimensional adversarial knowledge

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Toward privacy in public databases

TCC'05 Proceedings of the Second international conference on Theory of Cryptography
Secure anonymization for incremental datasets

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management

On the tradeoff between privacy and utility in data publishing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Differentially private recommender systems: building privacy into the net

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Scaling measurement experiments to planet-scale: ethical, regulatory and cultural considerations

Proceedings of the 1st ACM International Workshop on Hot Topics of Planet-Scale Mobility Measurements
Class-based graph anonymization for social network data

Proceedings of the VLDB Endowment
Distribution based microdata anonymization

Proceedings of the VLDB Endowment
Measuring risk and utility of anonymized data using information theory

Proceedings of the 2009 EDBT/ICDT Workshops
A practice-oriented framework for measuring privacy and utility in data sanitization systems

Proceedings of the 2010 EDBT/ICDT Workshops
Versatile publishing for privacy preservation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Rights protection of trajectory datasets with nearest-neighbor preservation

The VLDB Journal — The International Journal on Very Large Data Bases
APPT: A privacy preserving transformation tool for micro data release

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Synthesizing: art of anonymization

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Building a chain of trust: using policy and practice to enhance trustworthy clinical data discovery and sharing

Proceedings of the 2010 Workshop on Governance of Technology, Information and Policies
Extended k-anonymity models against sensitive attribute disclosure

Computer Communications
Can the Utility of Anonymized Data be Used for Privacy Breaches?

ACM Transactions on Knowledge Discovery from Data (TKDD)
Short paper: the NetSANI framework for analysis and fine-tuning of network trace sanitization

Proceedings of the fourth ACM conference on Wireless network security
Personal privacy vs population privacy: learning to attack anonymization

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Testing software in age of data privacy: a balancing act

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Bubble trouble: off-line de-anonymization of bubble forms

SEC'11 Proceedings of the 20th USENIX conference on Security
Anonymization of location data does not work: a large-scale measurement study

MobiCom '11 Proceedings of the 17th annual international conference on Mobile computing and networking
Cloning for privacy protection in multiple independent data publications

Proceedings of the 20th ACM international conference on Information and knowledge management
Detecting and resolving privacy conflicts for collaborative data sharing in online social networks

Proceedings of the 27th Annual Computer Security Applications Conference
On t-closeness with KL-divergence and semantic privacy

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Privacy and utility for defect prediction: experiments with MORPH

Proceedings of the 34th International Conference on Software Engineering
Anonymizing set-valued data by nonreciprocal recoding

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Analyzing characteristic host access patterns for re-identification of web user sessions

NordSec'10 Proceedings of the 15th Nordic conference on Information Security Technology for Applications
Publishing microdata with a robust privacy guarantee

Proceedings of the VLDB Endowment
An automated data utility clustering methodology using data constraint rules

Proceedings of the 2012 international workshop on Smart health and wellbeing
A modification of the Lloyd algorithm for k-anonymous quantization

Information Sciences: an International Journal
A probabilistic hybrid logic for sanitized information systems

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
VICUS: a noise addition technique for categorical data

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Measuring the privacy of user profiles in personalized information systems

Future Generation Computer Systems
Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization

Journal of Computer Security

Quantified Score

Hi-index	0.01

Visualization

Abstract

Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, k-anonymity requires that each "quasi-identifier" tuple appear in at least k records, while l-diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi-identifier. In this paper, we ask whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but k-anonymization does not guarantee any privacy. By contrast, we measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records. For our experimental evaluation, we use the same datasets from the UCI machine learning repository as were used in previous research on generalization and suppression. Our results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility. In most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, l-diversity, and similar methods based on generalization and suppression.