Random-data perturbation techniques and privacy-preserving data mining

Authors:
Hillol Kargupta;Souptik Datta;Qi Wang;Krishnamoorthy Sivakumar
Affiliations:
University of Maryland, Baltimore County, Department of Computer Science and Electrical Engineering, 21250, Baltimore, MD, USA;University of Maryland, Baltimore County, Department of Computer Science and Electrical Engineering, 21250, Baltimore, MD, USA;Washington State University, School of Electrical Engineering and Computer Science, 21250, Pullman, WA, USA;Washington State University, School of Electrical Engineering and Computer Science, 21250, Pullman, WA, USA
Venue:
Knowledge and Information Systems
Year:
2005

Citing 17
Cited 22

A data distortion by probability distribution

ACM Transactions on Database Systems (TODS)
A note on the largest eigenvalue of a large dimensional sample covariance matrix

Journal of Multivariate Analysis
On the eigenvectors of large dimensional sample covariance matrices

Journal of Multivariate Analysis
Security of random data perturbation methods

ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The statistical security of a statistical database

ACM Transactions on Database Systems (TODS)
MobiMine: monitoring the stock market from a PDA

ACM SIGKDD Explorations Newsletter
Secure multi-party computation problems and their applications: a review and open problems

Proceedings of the 2001 workshop on New security paradigms
Induction of Decision Trees

Machine Learning
Microdata Protection through Noise Addition

Inference Control in Statistical Databases, From Theory to Practice
Dependency Detection in MobiMine and Random Matrices

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

A privacy-preserving interdomain audit framework

Proceedings of the 5th ACM workshop on Privacy in electronic society
A fuzzy programming approach for data reduction and privacy in distance-based mining

International Journal of Information and Computer Security
Analysis of flow-correlation attacks in anonymity network

International Journal of Security and Networks
A privacy preserving technique for distance-based classification with worst case privacy guarantees

Data & Knowledge Engineering
Protecting business intelligence and customer privacy while outsourcing data mining tasks

Knowledge and Information Systems
Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining

Knowledge and Information Systems
A Data Perturbation Method by Field Rotation and Binning by Averages Strategy for Privacy Preservation

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
An efficient k-anonymous localization technique for assistive environments

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Nearest Neighbor Tour Circuit Encryption Algorithm Based Random Isomap Reduction

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Secure speech template protection in speaker verification system

Speech Communication
Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Knowledge and Information Systems
Publishing naive Bayesian classifiers: privacy without accuracy loss

Proceedings of the VLDB Endowment
Distributed data mining and agents

Engineering Applications of Artificial Intelligence
A secure digital camera based fingerprint verification system

Journal of Visual Communication and Image Representation
Efficient privacy preserving distributed clustering based on secret sharing

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Measurement error and statistical disclosure control

PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
A secure privacy preserved data aggregation scheme in non hierarchical networks

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part IV
A constraint satisfaction cryptanalysis of bloom filters in private record linkage

PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Non-metric multidimensional scaling for privacy-preserving data clustering

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
An attacker's view of distance preserving maps for privacy preserving data mining

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Hiding emerging patterns with local recoding generalization

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Providing group anonymity using wavelet transform

BNCOD'10 Proceedings of the 27th British national conference on Data Security and Security Data

Quantified Score

Hi-index	0.02

Visualization

Abstract

Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications.