On k-anonymity and the curse of dimensionality

Authors:
Charu C. Aggarwal
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 9
Cited 85

Internet privacy

Communications of the ACM
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining massively incomplete data sets by conceptual reconstruction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Personalized privacy preservation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Utility-based anonymization for privacy preservation with less information loss

ACM SIGKDD Explorations Newsletter
PRIVE: anonymous location-based queries in distributed mobile systems

Proceedings of the 16th international conference on World Wide Web
Approximate algorithms for K-anonymity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Hiding the presence of individuals from shared databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Preventing Location-Based Identity Inference in Anonymous Spatial Queries

IEEE Transactions on Knowledge and Data Engineering
Time series compressibility and privacy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
On static and dynamic methods for condensation-based privacy-preserving data mining

ACM Transactions on Database Systems (TODS)
Towards optimal k-anonymization

Data & Knowledge Engineering
A framework for condensation-based anonymization of string data

Data Mining and Knowledge Discovery
Supporting anonymous location queries in mobile environments with privacygrid

Proceedings of the 17th international conference on World Wide Web
Dynamic anonymization: accurate statistical analysis with privacy preservation

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Attribute selection in multivariate microaggregation

PAIS '08 Proceedings of the 2008 international workshop on Privacy and anonymity in information society
Anonymity preserving pattern discovery

The VLDB Journal — The International Journal on Very Large Data Bases
Providing k-anonymity in data mining

The VLDB Journal — The International Journal on Very Large Data Bases
The cost of privacy: destruction of data-mining utility in anonymized data publishing

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing transaction databases for publication

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A k-Anonymity Clustering Method for Effective Data Privacy Preservation

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
An Empirical Study of Utility Measures for k-Anonymisation

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
BSGI: An Effective Algorithm towards Stronger l-Diversity

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Generalization-Based Privacy-Preserving Data Collection

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Towards privacy-preserving integration of distributed heterogeneous data

Proceedings of the 2nd PhD workshop on Information and knowledge management
Information Leakage in Optimal Anonymized and Diversified Data

Information Hiding
Continuous privacy preserving publishing of data streams

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Privacy-preserving incremental data dissemination

Journal of Computer Security - Selected papers from the Third and Fourth Secure Data Management (SDM) workshops
Information disclosure by answers to XPath queries

Journal of Computer Security - Selected papers from the Third and Fourth Secure Data Management (SDM) workshops
Towards the evaluation of time series protection methods

Information Sciences: an International Journal
Privacy protection for RFID data

Proceedings of the 2009 ACM symposium on Applied Computing
Privacy-preserving data publishing for cluster analysis

Data & Knowledge Engineering
On the tradeoff between privacy and utility in data publishing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Differentially private recommender systems: building privacy into the net

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing healthcare data: a case study on the blood transfusion service

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing location-based RFID data

C3S2E '09 Proceedings of the 2nd Canadian Conference on Computer Science and Software Engineering
A novel anonymization algorithm: Privacy protection and knowledge preservation

Expert Systems with Applications: An International Journal
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
An integrated framework for de-identifying unstructured medical data

Data & Knowledge Engineering
Walking in the crowd: anonymizing trajectory data for pattern analysis

Proceedings of the 18th ACM conference on Information and knowledge management
Privacy and anonymization for very large datasets

Proceedings of the 18th ACM conference on Information and knowledge management
COP: privacy-preserving multidimensional partition in DAS paradigm

Proceedings of the 2009 EDBT/ICDT Workshops
Transparent anonymization: Thwarting adversaries who know the algorithm

ACM Transactions on Database Systems (TODS)
The hardness and approximation algorithms for l-diversity

Proceedings of the 13th International Conference on Extending Database Technology
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Probabilistic anonymity

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Privacy-preserving data mining through knowledge model sharing

PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Privacy-preserving data mining: A feature set partitioning approach

Information Sciences: an International Journal
Suppressing microdata to prevent classification based inference

The VLDB Journal — The International Journal on Very Large Data Bases
Transfer learning through indirect encoding

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Anonymization of moving objects databases by clustering and perturbation

Information Systems
Approximate algorithms with generalizing attribute values for k-anonymity

Information Systems
Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

ACM Transactions on Knowledge Discovery from Data (TKDD)
APPT: A privacy preserving transformation tool for micro data release

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
A family of enhanced (L,α)-diversity models for privacy preserving data publishing

Future Generation Computer Systems
Local and global recoding methods for anonymizing set-valued data

The VLDB Journal — The International Journal on Very Large Data Bases
Privacy-preserving data sharing in cloud computing

Journal of Computer Science and Technology
Quantifying fine-grained privacy risk and representativeness in medical data

Proceedings of the 2011 workshop on Data mining for medicine and healthcare
Dynamic anonymization for marginal publication

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Privacy-preserving statistical analysis on ubiquitous health data

TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
Publishing anonymous survey rating data

Data Mining and Knowledge Discovery
Sherlock holmes' evil twin: on the impact of global inference for online privacy

Proceedings of the 2011 workshop on New security paradigms workshop
Weak k-anonymity: a low-distortion model for protecting privacy

ISC'06 Proceedings of the 9th international conference on Information Security
Data anonymization using an improved utility measurement

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Achieving k-anonymity by clustering in attribute hierarchical structures

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Information disclosure by XPath queries

SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management
DuoWave: Mitigating the curse of dimensionality for uncertain data

Data & Knowledge Engineering
On the identity anonymization of high-dimensional rating data

Concurrency and Computation: Practice & Experience
Privacy preservation by disassociation

Proceedings of the VLDB Endowment
Detecting dependencies in an anonymized dataset

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
PrivBasis: frequent itemset mining with differential privacy

Proceedings of the VLDB Endowment
Trading privacy for information loss in the blink of an eye

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Fuzzy based privacy preserving classification of data streams

Proceedings of the CUBE International Information Technology Conference
An automated data utility clustering methodology using data constraint rules

Proceedings of the 2012 international workshop on Smart health and wellbeing
A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining

Transactions on Data Privacy
An Enhanced Utility-Driven Data Anonymization Method

Transactions on Data Privacy
Privacy-preserving trajectory data publishing by local suppression

Information Sciences: an International Journal
Priority driven k-anonymisation for privacy protection

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
AIM: a new privacy preservation algorithm for incomplete microdata based on anatomy

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Application and analysis of multidimensional negative surveys in participatory sensing applications

Pervasive and Mobile Computing
Trends and research directions for privacy preserving approaches on the cloud

Proceedings of the 6th ACM India Computing Convention
MAGE: A semantics retaining K-anonymization method for mixed data

Knowledge-Based Systems
Effective mix-zone anonymization techniques for mobile travelers

Geoinformatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the wide availability of personal data has made the problem of privacy preserving data mining an important one. A number of methods have recently been proposed for privacy preserving data mining of multidimensional data records. One of the methods for privacy preserving data mining is that of anonymization, in which a record is released only if it is indistinguishable from k other entities in the data. We note that methods such as k-anonymity are highly dependent upon spatial locality in order to effectively implement the technique in a statistically robust way. In high dimensional space the data becomes sparse, and the concept of spatial locality is no longer easy to define from an application point of view. In this paper, we view the k-anonymization problem from the perspective of inference attacks over all possible combinations of attributes. We show that when the data contains a large number of attributes which may be considered quasi-identifiers, it becomes difficult to anonymize the data without an unacceptably high amount of information loss. This is because an exponential number of combinations of dimensions can be used to make precise inference attacks, even when individual attributes are partially specified within a range. We provide an analysis of the effect of dimensionality on k-anonymity methods. We conclude that when a data set contains a large number of attributes which are open to inference attacks, we are faced with a choice of either completely suppressing most of the data or losing the desired level of anonymity. Thus, this paper shows that the curse of high dimensionality also applies to the problem of privacy preserving data mining.