A security machanism for statistical database
ACM Transactions on Database Systems (TODS)
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Advances in Inference Control in Statistical Databases: An Overview
Inference Control in Statistical Databases, From Theory to Practice
Secure and private sequence comparisons
Proceedings of the 2003 ACM workshop on Privacy in the electronic society
Practical privacy: the SuLQ framework
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Models and Methods for Privacy-Preserving Data Analysis and Publishing
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Smooth sensitivity and sampling in private data analysis
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Privacy, accuracy, and consistency too: a holistic solution to contingency table release
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Auditing and Inference Control in Statistical Databases
IEEE Transactions on Software Engineering
Towards Practical Privacy for Genomic Computation
SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Calibrating noise to sensitivity in private data analysis
TCC'06 Proceedings of the Third conference on Theory of Cryptography
Differentially Private Empirical Risk Minimization
The Journal of Machine Learning Research
To release or not to release: evaluating information leaks in aggregate human-genome data
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Compressive mechanism: utilizing sparse representation in differential privacy
Proceedings of the 10th annual ACM workshop on Privacy in the electronic society
ACM SIGCOMM Computer Communication Review
Differential privacy in data publication and analysis
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Genodroid: are privacy-preserving genomic tests ready for prime time?
Proceedings of the 2012 ACM workshop on Privacy in the electronic society
Privacy-preserving data exploration in genome-wide association studies
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Addressing the concerns of the lacks family: quantification of kin genomic privacy
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Protecting and evaluating genomic privacy in medical tests and personalized medicine
Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
Secure genomic testing with size- and position-hiding private substring matching
Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
Differentially private histogram publication
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Genome-wide association studies (GWAS) aim at discovering the association between genetic variations, particularly single-nucleotide polymorphism (SNP), and common diseases, which is well recognized to be one of the most important and active areas in biomedical research. Also renowned is the privacy implication of such studies, which has been brought into the limelight by the recent attack proposed by Homer et al. Homer's attack demonstrates that it is possible to identify a GWAS participant from the allele frequencies of a large number of SNPs. Such a threat, unfortunately, was found in our research to be significantly understated. In this paper, we show that individuals can actually be identified from even a relatively small set of statistics, as those routinely published in GWAS papers. We present two attacks. The first one extends Homer's attack with a much more powerful test statistic, based on the correlations among different SNPs described by coefficient of determination (r2). This attack can determine the presence of an individual from the statistics related to a couple of hundred SNPs. The second attack can lead to complete disclosure of hundreds of participants' SNPs, through analyzing the information derived from published statistics. We also found that those attacks can succeed even when the precisions of the statistics are low and part of data is missing. We evaluated our attacks on the real human genomes and concluded that such threats are completely realistic.