Optimal Semijoins for Distributed Database Systems
IEEE Transactions on Software Engineering
Federated database systems for managing distributed, heterogeneous, and autonomous databases
ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
Implementing a relational database by means of specialzed hardware
ACM Transactions on Database Systems (TODS)
Using Semi-Joins to Solve Relational Queries
Journal of the ACM (JACM)
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A second look at bloom filters
Communications of the ACM
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Fusion Queries over Internet Databases
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
A secure distributed framework for achieving k-anonymity
The VLDB Journal — The International Journal on Very Large Data Bases
Privacy preserving data mining of sequential patterns for network traffic data
Information Sciences: an International Journal
Privacy-preserving indexing of documents on the network
The VLDB Journal — The International Journal on Very Large Data Bases
Performance-oriented privacy-preserving data integration
DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Hi-index | 0.00 |
We present the motivation, use-case and requirements of a clinical case research network that would allow biomedical researchers to perform retrospective analysis on de-identified clinical cases joined across a large scale (nationwide) distributed network. Based on semi-join adaptive plans for fusion-queries, in this paper we discuss how joining can be done in a way that protects the privacy of the individual patients involved. Our method is based on a cryptographically strong keyed-hash algorithm (HMAC.) These hash values are truncated and the resulting hash-collisions in semi-join filters are exploited to limit the ability of an apprentice-site to re-identify patients in the filter. As a measure of privacy we use likelihood ratios. Since the join key is based on real person identifiers, we need to apply the methods of record linkage to hashing and semi-join filters. We find that multiple disjunctive rules as used in deterministic matching, lead here to a higher privacy risk than rules based on a single identifier vector.