Privacy-preserving data publishing for cluster analysis

Authors:
Benjamin C. M. Fung;Ke Wang;Lingyu Wang;Patrick C. K. Hung
Affiliations:
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada H3G 1M8;School of Computing Science, Simon Fraser University, BC, Canada V5A 1S6;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada H3G 1M8;Faculty of Business and Information Technology, University of Ontario Institute of Technology, Oshawa, ON, Canada L1H 7K4
Venue:
Data & Knowledge Engineering
Year:
2009

Citing 36
Cited 14

C4.5: programs for machine learning

C4.5: programs for machine learning
Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Information Retrieval

Information Retrieval
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Knowledge discovery by probabilistic clustering of distributed databases

Data & Knowledge Engineering
Template-Based Privacy Preservation in Classification Problems

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Workload-aware anonymization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online clustering of parallel data streams

Data & Knowledge Engineering
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
ST-DBSCAN: An algorithm for clustering spatial-temporal data

Data & Knowledge Engineering
Handicapping attacker's confidence: an alternative to k-anonymization

Knowledge and Information Systems
Capturing data usefulness and privacy protection in K-anonymisation

Proceedings of the 2007 ACM symposium on Applied computing
Anonymizing Classification Data for Privacy Preservation

IEEE Transactions on Knowledge and Data Engineering
Investigating diversity of clustering methods: An empirical comparison

Data & Knowledge Engineering
Privacy preserving clustering on horizontally partitioned data

Data & Knowledge Engineering
k-Unlinkability: A privacy protection model for distributed data

Data & Knowledge Engineering
Anonymity for continuous data publishing

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Data utility and privacy protection trade-off in k-anonymisation

PAIS '08 Proceedings of the 2008 international workshop on Privacy and anonymity in information society
Privacy-preserving data mashup

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Privacy protection for RFID data

Proceedings of the 2009 ACM symposium on Applied Computing
Integrating private databases for data analysis

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Robust clustering methods: a unified view

IEEE Transactions on Fuzzy Systems

Anonymizing location-based RFID data

C3S2E '09 Proceedings of the 2nd Canadian Conference on Computer Science and Software Engineering
Collaborative clustering with background knowledge

Data & Knowledge Engineering
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
On the use of economic price theory to find the optimum levels of privacy and information utility in non-perturbative microdata anonymisation

Data & Knowledge Engineering
Background knowledge integration in clustering using purity indexes

KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Fuzzy based clustering algorithm for privacy preserving data mining

International Journal of Business Information Systems
Privacy-preserving publishing microdata with full functional dependencies

Data & Knowledge Engineering
Privacy-aware collection of aggregate spatial data

Data & Knowledge Engineering
Knowledge hiding from tree and graph databases

Data & Knowledge Engineering
Privacy-preserving back-propagation and extreme learning machine algorithms

Data & Knowledge Engineering
Clustering-oriented privacy-preserving data publishing

Knowledge-Based Systems
Low Dimensional Data Privacy Preservation Using Multi Layer Artificial Neural Network

International Journal of Intelligent Information Technologies
Anonymizing classification data using rough set theory

Knowledge-Based Systems
Fast clustering-based anonymization approaches with time constraints for data streams

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Releasing person-specific data could potentially reveal sensitive information about individuals. k-anonymization is a promising privacy protection mechanism in data publishing. Although substantial research has been conducted on k-anonymization and its extensions in recent years, only a few prior works have considered releasing data for some specific purpose of data analysis. This paper presents a practical data publishing framework for generating a masked version of data that preserves both individual privacy and information usefulness for cluster analysis. Experiments on real-life data suggest that by focusing on preserving cluster structure in the masking process, the cluster quality is significantly better than the cluster quality of the masked data without such focus. The major challenge of masking data for cluster analysis is the lack of class labels that could be used to guide the masking process. Our approach converts the problem into the counterpart problem for classification analysis, wherein class labels encode the cluster structure in the data, and presents a framework to evaluate the cluster quality on the masked data.