On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Security of random data perturbation methods
ACM Transactions on Database Systems (TODS)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Secure Databases: Constraints, Inference Channels, and Monitoring Disclosures
IEEE Transactions on Knowledge and Data Engineering
Limiting privacy breaches in privacy preserving data mining
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Assuring privacy when big brother is watching
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
On the Privacy Preserving Properties of Random Data Perturbation Techniques
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using randomized response techniques for privacy-preserving data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
State-of-the-art in privacy preserving data mining
ACM SIGMOD Record
A formal analysis of information disclosure in data exchange
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
When do data mining results violate privacy?
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Sensitivity analysis in Bayesian networks: from single to multiple parameters
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
On the complexity of optimal K-anonymity
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random-data perturbation techniques and privacy-preserving data mining
Knowledge and Information Systems
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Deriving private information from randomized data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Incognito: efficient full-domain K-anonymity
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Checking for k-anonymity violation by views
VLDB '05 Proceedings of the 31st international conference on Very large data bases
IEEE Transactions on Knowledge and Data Engineering
Blocking Anonymity Threats Raised by Frequent Itemset Mining
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Probabilistic privacy analysis of published views
Proceedings of the 5th ACM workshop on Privacy in electronic society
L-diversity: Privacy beyond k-anonymity
ACM Transactions on Knowledge Discovery from Data (TKDD)
The boundary between privacy and utility in data publishing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Evaluating privacy threats in released database views by symmetric indistinguishability
Journal of Computer Security - Selected papers from the Third and Fourth Secure Data Management (SDM) workshops
Butterfly: Protecting Output Privacy in Stream Mining
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Privacy in GLAV information integration
ICDT'07 Proceedings of the 11th international conference on Database Theory
Privacy in database publishing
ICDT'05 Proceedings of the 10th international conference on Database Theory
Indistinguishability: the other aspect of privacy
SDM'06 Proceedings of the Third VLDB international conference on Secure Data Management
A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining
Transactions on Data Privacy
Hi-index | 0.00 |
We address the problem of publishing a Naïve Bayesian Classifier (NBC) or, equivalently, publishing the necessary views for building an NBC, while protecting privacy of the individuals who provided the training data. Our approach completely preserves the accuracy of the original classifier, and thus significantly improves on current approaches, such as randomization or anonymization, which typically degrade accuracy to preserve privacy. Current query-view security checkers address the question of 'Is the view safe to publish?' and are computationally expensive (often Πp2-complete). Here instead, we tackle the question of 'How to make a view safe to publish?' and propose a linear-time algorithm to publish safe NBC-enabling views. We first show that a simple measure that restricts the ratios between the published NBC statistics is sufficient to prevent any breach of privacy. Then, we propose a linear-time algorithm to enforce this measure by producing perturbed statistics that assure both (i) individuals' privacy, and (ii) a classifier that behaves in the same way as the NBC trained on the original data. By carefully expressing the derived statistics using rational numbers, we can easily produce synthetic (sanitized) datasets. Thus, for any given dataset, we produce another dataset that is secure to publish (w.r.t. a uniform prior) and achieves the same classification accuracy. Finally, we extend our results by providing sufficient conditions to cope with arbitrary (non-uniform prior) distributions, and we validate their effectiveness in practice through experiments on real-world data.