PrivGene: differentially private model fitting using genetic algorithms

Authors:
Jun Zhang;Xiaokui Xiao;Yin Yang;Zhenjie Zhang;Marianne Winslett
Affiliations:
Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Advanced Digital Sciences Center, Singapore, Singapore;Advanced Digital Sciences Center, Singapore, Singapore;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 24
Cited 0

Support-Vector Networks

Machine Learning
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence

Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Adapting Operator Probabilities in Genetic Algorithms

Proceedings of the 3rd International Conference on Genetic Algorithms
A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization

Proceedings of the 3rd International Conference on Genetic Algorithms
Mechanism Design via Differential Privacy

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Private record matching using differential privacy

Proceedings of the 13th International Conference on Extending Database Technology
Optimizing linear counting queries under differential privacy

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data mining with differential privacy

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting the accuracy of differentially private histograms through consistency

Proceedings of the VLDB Endowment
No free lunch in data privacy

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differentially private data cubes: optimizing noise sources and consistency

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Privacy-preserving statistical estimation with optimal convergence rates

Proceedings of the forty-third annual ACM symposium on Theory of computing
Differentially Private Empirical Risk Minimization

The Journal of Machine Learning Research
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography
A rigorous and customizable framework for privacy

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
GUPT: privacy preserving data analysis made easy

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Least squares quantization in PCM

IEEE Transactions on Information Theory
Differentially Private Spatial Decompositions

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Differentially Private Histogram Publication

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
PrivBasis: frequent itemset mining with differential privacy

Proceedings of the VLDB Endowment
Low-rank mechanism: optimizing batch queries under differential privacy

Proceedings of the VLDB Endowment
Functional mechanism: regression analysis under differential privacy

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

epsilon-differential privacy is rapidly emerging as the state-of-the-art scheme for protecting individuals' privacy in published analysis results over sensitive data. The main idea is to perform random perturbations on the analysis results, such that any individual's presence in the data has negligible impact on the randomized results. This paper focuses on analysis tasks that involve model fitting, i.e., finding the parameters of a statistical model that best fit the dataset. For such tasks, the quality of the differentially private results depends upon both the effectiveness of the model fitting algorithm, and the amount of perturbations required to satisfy the privacy guarantees. Most previous studies start from a state-of-the-art, non-private model fitting algorithm, and develop a differentially private version. Unfortunately, many model fitting algorithms require intensive perturbations to satisfy -differential privacy, leading to poor overall result quality. Motivated by this, we propose PrivGene, a general-purpose differentially private model fitting solution based on genetic algorithms (GA). PrivGene needs significantly less perturbations than previous methods, and it achieves higher overall result quality, even for model fitting tasks where GA is not the first choice without privacy considerations. Further, PrivGene performs the random perturbations using a novel technique called the enhanced exponential mechanism, which improves over the exponential mechanism by exploiting the special properties of model fitting tasks. As case studies, we apply PrivGene to three common analysis tasks involving model fitting: logistic regression, SVM classification, and k-means clustering. Extensive experiments using real data confirm the high result quality of PrivGene, and its superiority over existing methods.