Clustering-oriented privacy-preserving data publishing

Authors:
Weiwei Ni;Zhihong Chong
Affiliations:
Department of Computer Science and Engineering, Southeast University, Nanjing 210096, PR China;Department of Computer Science and Engineering, Southeast University, Nanjing 210096, PR China
Venue:
Knowledge-Based Systems
Year:
2012

Citing 18
Cited 2

Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Fully Unsupervised Fuzzy Clustering with Entropy Criterion

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 3
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms

The VLDB Journal — The International Journal on Very Large Data Bases
Anatomy: simple and effective privacy preservation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Privacy preserving data obfuscation for inherently clustered data

International Journal of Information and Computer Security
Privacy-preserving anonymization of set-valued data

Proceedings of the VLDB Endowment
Privacy-preserving data publishing for cluster analysis

Data & Knowledge Engineering
On the Anonymization of Sparse High-Dimensional Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Data clustering with size constraints

Knowledge-Based Systems
Differentially private data release for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An attacker's view of distance preserving maps for privacy preserving data mining

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
Anonymizing transaction data by integrating suppression and generalization

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Anonymizing classification data using rough set theory

Knowledge-Based Systems
Fast clustering-based anonymization approaches with time constraints for data streams

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Privacy-preserving data publishing has attracted considerable research interests in recent years. One of the problems in such practices is how to trade-off between data utility and privacy protection. This problem heavily deteriorates when the published data are used to do cluster analysis; clustering demands differences between singles for grouping while privacy preserving aims to hide single identifications. In this paper, a mixed mode data obfuscation method AENDO is proposed, which provides a tradeoff strategy from a novel view. The underlying principle is to keep nearest neighborhood structures of data points while data are obfuscated. In particular, for each data point, AENDO differentiates its attributes into neighboring dispersed attributes and neighboring concentrated ones. Furthermore, pertinent statistical data substitution and data swapping strategies are applied to these attributes, respectively. An extensive set of experiments on UCI data sets are provided to assess the effectiveness of our solution, including comparing AENDO with RBT which is one of the best methods on maintaining data usability for clustering. Our results demonstrate that AENDO behaves similarly with RBT on maintaining data utility for clustering, while it outperforms NeNDS by a factor of approximate 10%. Meanwhile, it delivers better anti-inferring effect compared with RBT and NeNDS.