A Cryptographic Approach to Securely Share and Query Genomic Sequences

Authors:
M. Kantarcioglu;Wei Jiang;Ying Liu;B. Malin
Affiliations:
Dept. of Comput. Sci., Univ. of Texas, Dallas, TX;-;-;-
Venue:
IEEE Transactions on Information Technology in Biomedicine
Year:
2008

Citing 0
Cited 8

Formal anonymity models for efficient privacy-preserving joins

Data & Knowledge Engineering
Private record matching using differential privacy

Proceedings of the 13th International Conference on Extending Database Technology
Secure outsourcing of DNA searching via finite automata

DBSec'10 Proceedings of the 24th annual IFIP WG 11.3 working conference on Data and applications security and privacy
Countering GATTACA: efficient and secure testing of fully-sequenced human genomes

Proceedings of the 18th ACM conference on Computer and communications security
Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices

Computer Methods and Programs in Biomedicine
Addressing the concerns of the lacks family: quantification of kin genomic privacy

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Protecting and evaluating genomic privacy in medical tests and personalized medicine

Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
Secure genomic testing with size- and position-hiding private substring matching

Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society

Quantified Score

Hi-index	0.00

Visualization

Abstract

To support large-scale biomedical research projects, organizations need to share person-specific genomic sequences without violating the privacy of their data subjects. In the past, organizations protected subjects' identities by removing identifiers, such as name and social security number; however, recent investigations illustrate that deidentified genomic data can be ldquoreidentifiedrdquo to named individuals using simple automated methods. In this paper, we present a novel cryptographic framework that enables organizations to support genomic data mining without disclosing the raw genomic sequences. Organizations contribute encrypted genomic sequence records into a centralized repository, where the administrator can perform queries, such as frequency counts, without decrypting the data. We evaluate the efficiency of our framework with existing databases of single nucleotide polymorphism (SNP) sequences and demonstrate that the time needed to complete count queries is feasible for real world applications. For example, our experiments indicate that a count query over 40 SNPs in a database of 5000 records can be completed in approximately 30 min with off-the-shelf technology. We further show that approximation strategies can be applied to significantly speed up query execution times with minimal loss in accuracy. The framework can be implemented on top of existing information and network technologies in biomedical environments.