Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Limiting privacy breaches in privacy preserving data mining
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Achieving k-anonymity privacy protection using generalization and suppression
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
On the Privacy Preserving Properties of Random Data Perturbation Techniques
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Deriving private information from randomized data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The applicability of the perturbation based privacy preserving data mining for real-world data
Data & Knowledge Engineering
A privacy preserving technique for distance-based classification with worst case privacy guarantees
Data & Knowledge Engineering
Privacy-Preserving Data Mining: Models and Algorithms
Privacy-Preserving Data Mining: Models and Algorithms
Hi-index | 0.00 |
The ubiquity of the internet not only makes it very convenient for individuals or organizations to share data for data mining or statistical analysis, but also greatly increases the chance of privacy breach. There exist many techniques such as random perturbation to protect the privacy of such data sets. However, perturbation often has negative impacts on the quality of data mining or statistical analysis conducted over the perturbed data. This paper studies the impact of random perturbation for a popular data mining and analysis method: linear discriminant analysis. The contributions are two fold. First, we discover that for large data sets, the impact of perturbation is quite limited (i.e., high quality results may be obtained directly from perturbed data) if the perturbation process satisfies certain conditions. Second, we discover that for small data sets, the negative impact of perturbation can be reduced by publishing additional statistics about the perturbation along with the perturbed data. We provide both theoretical derivations and experimental verifications of these results.