Privacy Preserving Categorical Data Analysis with Unknown Distortion Parameters

Authors:
Ling Guo;Xintao Wu
Affiliations:
Software and Information Systems Department, University of North Carolina at Charlotte, Charlotte, NC 28223, USA. e-mail: lguo2@uncc.edu;Software and Information Systems Department, University of North Carolina at Charlotte, Charlotte, NC 28223, USA. e-mail: xwu@uncc.edu
Venue:
Transactions on Data Privacy
Year:
2009

Citing 22
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Data Mining and Knowledge Discovery
Microdata Protection through Noise Addition

Inference Control in Statistical Databases, From Theory to Practice
Randomization in privacy preserving data mining

ACM SIGKDD Explorations Newsletter
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using randomized response techniques for privacy-preserving data mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Framework for High-Accuracy Privacy-Preserving Mining

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Towards identity anonymization on graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Privacy-MaxEnt: integrating background knowledge in privacy quantification

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On addressing accuracy concerns in privacy preserving association rule mining

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining

The Role of Ontologies in the Anonymization of Textual Variables

Proceedings of the 2010 conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence
Privacy protection of textual attributes through a semantic-based masking method

Information Fusion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Randomized Response techniques have been investigated in privacy preserving categorical data analysis. However, the released distortion parameters can be exploited by attackers to breach privacy. In this paper, we investigate whether data mining or statistical analysis tasks can still be conducted on randomized data when distortion parameters are not disclosed to data miners. We first examine how various objective association measures between two variables may be affected by randomization. We then extend to multiple variables by examining the feasibility of hierarchical loglinear modeling. Finally we show some classic data mining tasks that cannot be applied on the randomized data directly.