Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data

Authors:
Xiao-Bai Li;Sumit Sarkar
Affiliations:
College of Management, University of Massachusetts Lowell, Lowell, Massachusetts 01854;School of Management, University of Texas at Dallas, Richardson, Texas 75080
Venue:
Information Systems Research
Year:
2006

Citing 18
Cited 12

Practical data-swapping: the first steps

ACM Transactions on Database Systems (TODS)
A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Protecting privacy

Communications of the ACM
C4.5: programs for machine learning

C4.5: programs for machine learning
Consumer privacy concerns about Internet marketing

Communications of the ACM
Security of statistical databases: multidimensional transformation

ACM Transactions on Database Systems (TODS)
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
A General Additive Data Perturbation Method for Database Security

Management Science
Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators

Management Science
A study of student privacy issues at Stanford University

Communications of the ACM - Robots: intelligence, versatility, adaptivity
The Security of Confidential Numerical Data in Databases

Information Systems Research
Revealing information while preserving privacy

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat

Management Science
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering

Context-based market basket analysis in a multiple-store environment

Decision Support Systems
A Bayesian Approach for Estimating and Replacing Missing Categorical Data

Journal of Data and Information Quality (JDIQ)
Overview and Framework for Data and Information Quality Research

Journal of Data and Information Quality (JDIQ)
Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

Operations Research
A privacy protection technique for publishing data mining models and research data

ACM Transactions on Management Information Systems (TMIS)
An improved EDP algorithm to privacy protection in data mining

BI'11 Proceedings of the 2011 international conference on Brain informatics
Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data

Information Systems Research
Two New Prediction-Driven Approaches to Discrete Choice Prediction

ACM Transactions on Management Information Systems (TMIS)
Research Note---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation

Information Systems Research
Class-Restricted Clustering and Microperturbation for Data Privacy

Management Science
Internet privacy concerns: an integrated conceptualization and four empirical studies

MIS Quarterly
Developing privacy solutions for sharing and analysing healthcare data

International Journal of Business Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase II (to preserve the joint distribution). Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method.