An efficient algorithm for sequential random sampling
ACM Transactions on Mathematical Software (TOMS)
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Privacy, accuracy, and consistency too: a holistic solution to contingency table release
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mechanism Design via Differential Privacy
FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Differential privacy: a survey of results
TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
Differentially private combinatorial optimization
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Differentially private data release through multidimensional partitioning
SDM'10 Proceedings of the 7th VLDB conference on Secure data management
Boosting the accuracy of differentially private histograms through consistency
Proceedings of the VLDB Endowment
Differentially private data release for data mining
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Differentially Private Empirical Risk Minimization
The Journal of Machine Learning Research
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Calibrating noise to sensitivity in private data analysis
TCC'06 Proceedings of the Third conference on Theory of Cryptography
Differentially Private Histogram Publication
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
Privacy concerns are among the major barriers to efficient secondary use of information and data on humans. Differential privacy is a relatively recent measure that has received much attention in machine learning as it quantifies individual risk using a strong cryptographically motivated notion of privacy. At the core of differential privacy lies the concept of information dissemination through a randomized process. One way of adding the needed randomness to any process is to pre-randomize the input. This can yield lower quality results than other more specialized approaches, but can be an attractive alternative when i. there does not exist a specialized differentially private alternative, or when ii. multiple processes applied in parallel can use the same pre-randomized input. A simple way to do input randomization is to compute perturbed histograms, which essentially are noisy multiset membership functions. Unfortunately, computation of perturbed histograms is only efficient when the data stems from a low-dimensional discrete space. The restriction to discrete spaces can be mitigated by discretization; Lei presented in 2011 an analysis of discretization in the context of M-estimators. Here we address the restriction regarding the dimensionality of the data. In particular we present a differentially private approximation algorithm for selecting features that preserve conditional frequency densities, and use this to project data prior to computing differentially private histograms. The resulting projected histograms can be used as machine learning input and include the necessary randomness for differential privacy. We empirically validate the use of differentially private projected histograms for learning binary and multinomial logistic regression models using four real world data sets.