Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Revealing information while preserving privacy
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Limiting privacy breaches in privacy preserving data mining
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Practical privacy: the SuLQ framework
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy via pseudorandom sketches
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Toward privacy in public databases
TCC'05 Proceedings of the Second international conference on Theory of Cryptography
Calibrating noise to sensitivity in private data analysis
TCC'06 Proceedings of the Third conference on Theory of Cryptography
Composition attacks and auxiliary information in data privacy
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Releasing search queries and clicks privately
Proceedings of the 18th international conference on World wide web
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards Fair Leader Election in Wireless Networks
ADHOC-NOW '09 Proceedings of the 8th International Conference on Ad-Hoc, Mobile and Wireless Networks
Privacy-Preserving Data Publishing
Foundations and Trends in Databases
Practical universal random sampling
IWSEC'10 Proceedings of the 5th international conference on Advances in information and computer security
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differentially Private Empirical Risk Minimization
The Journal of Machine Learning Research
Optimal sampling from sliding windows
Journal of Computer and System Sciences
A rigorous and customizable framework for privacy
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Approximately optimal auctions for selling privacy when costs are correlated with data
Proceedings of the 13th ACM Conference on Electronic Commerce
On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy
Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security
Pufferfish: A framework for mathematical privacy definitions
ACM Transactions on Database Systems (TODS)
A near-optimal algorithm for differentially-private principal components
The Journal of Machine Learning Research
Hi-index | 0.00 |
Many organizations such as the U.S. Census publicly release samples of data that they collect about private citizens. These datasets are first anonymized using various techniques and then a small sample is released so as to enable “do-it-yourself” calculations. This paper investigates the privacy of the second step of this process: sampling. We observe that rare values – values that occur with low frequency in the table – can be problematic from a privacy perspective. To our knowledge, this is the first work that quantitatively examines the relationship between the number of rare values in a table and the privacy in a released random sample. If we require ε-privacy (where the larger ε is, the worse the privacy guarantee) with probability at least 1 – δ, we say that a value is rare if it occurs in at most $\tilde{O}(\frac{1}{\epsilon})$ rows of the table (ignoring log factors). If there are no rare values, then we establish a direct connection between sample size that is safe to release and privacy. Specifically, if we select each row of the table with probability at most ε then the sample is O(ε)-private with high probability. In the case that there are t rare values, then the sample is $\tilde{O}(\epsilon \delta /t)$-private with probability at least 1–δ.