On privacy preservation against adversarial data mining

Authors:
Charu C. Aggarwal;Jian Pei;Bo Zhang
Affiliations:
IBM T. J. Watson Research Center;Simon Fraser University, Canada;Zhejiang University, China
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 10
Cited 10

A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining massively incomplete data sets by conceptual reconstruction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Adversarial classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Transparent anonymization: Thwarting adversaries who know the algorithm

ACM Transactions on Database Systems (TODS)
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Protecting individual information against inference attacks in data publishing

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Website privacy preservation for query log publishing

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Can the Utility of Anonymized Data be Used for Privacy Breaches?

ACM Transactions on Knowledge Discovery from Data (TKDD)
A rigorous and customizable framework for privacy

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Speeding up correlation search for binary data

Pattern Recognition Letters
Pufferfish: A framework for mathematical privacy definitions

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Privacy preserving data processing has become an important topic recently because of advances in hardware technology which have lead to widespread proliferation of demographic and sensitive data. A rudimentary way to preserve privacy is to simply hide the information in some of the sensitive fields picked by a user. However, such a method is far from satisfactory in its ability to prevent adversarial data mining. Real data records are not randomly distributed. As a result, some fields in the records may be correlated with one another. If the correlation is sufficiently high, it may be possible for an adversary to predict some of the sensitive fields using other fields.In this paper, we study the problem of privacy preservation against adversarial data mining, which is to hide a minimal set of entries so that the privacy of the sensitive fields are satisfactorily preserved. In other words, even by data mining, an adversary still cannot accurately recover the hidden data entries. We model the problem concisely and develop an efficient heuristic algorithm which can find good solutions in practice. An extensive performance study is conducted on both synthetic and real data sets to examine the effectiveness of our approach.