Privacy preserving linear discriminant analysis from perturbed data

Authors:
Somnath Chakrabarti;Zhiyuan Chen;Aryya Gangopadhyay;Shibnath Mukherjee
Affiliations:
University of Maryland, Baltimore, MD;University of Maryland, Baltimore, MD;University of Maryland, Baltimore, MD;Yahoo! Research and Development, India and University of Maryland, Baltimore, MD
Venue:
Proceedings of the 2010 ACM Symposium on Applied Computing
Year:
2010

Citing 9
Cited 0

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The applicability of the perturbation based privacy preserving data mining for real-world data

Data & Knowledge Engineering
A privacy preserving technique for distance-based classification with worst case privacy guarantees

Data & Knowledge Engineering
Privacy-Preserving Data Mining: Models and Algorithms

Privacy-Preserving Data Mining: Models and Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ubiquity of the internet not only makes it very convenient for individuals or organizations to share data for data mining or statistical analysis, but also greatly increases the chance of privacy breach. There exist many techniques such as random perturbation to protect the privacy of such data sets. However, perturbation often has negative impacts on the quality of data mining or statistical analysis conducted over the perturbed data. This paper studies the impact of random perturbation for a popular data mining and analysis method: linear discriminant analysis. The contributions are two fold. First, we discover that for large data sets, the impact of perturbation is quite limited (i.e., high quality results may be obtained directly from perturbed data) if the perturbation process satisfies certain conditions. Second, we discover that for small data sets, the negative impact of perturbation can be reduced by publishing additional statistics about the perturbation along with the perturbed data. We provide both theoretical derivations and experimental verifications of these results.