A constraint satisfaction cryptanalysis of bloom filters in private record linkage

Authors:
Mehmet Kuzu;Murat Kantarcioglu;Elizabeth Durham;Bradley Malin
Affiliations:
Dept. of Computer Science, University of Texas at Dallas, Richardson, TX;Dept. of Computer Science, University of Texas at Dallas, Richardson, TX;Dept. of Biomedical Informatics, Vanderbilt University, Nashville, TN;Dept. of Biomedical Informatics, Vanderbilt University, Nashville, TN
Venue:
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Year:
2011

Citing 21
Cited 5

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Anonymity, unobservability, and pseudeonymity — a proposal for terminology

International workshop on Designing privacy enhancing technologies: design issues in anonymity and unobservability
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Foundations of Cryptography: Volume 2, Basic Applications

Foundations of Cryptography: Volume 2, Basic Applications
Secure and private sequence comparisons

Proceedings of the 2003 ACM workshop on Privacy in the electronic society
Privacy-preserving data integration and sharing

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Random-data perturbation techniques and privacy-preserving data mining

Knowledge and Information Systems
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Privacy Preserving Data Classification with Rotation Perturbation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Secure multiparty computation of approximations

ACM Transactions on Algorithms (TALG)
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge

ACM Transactions on Knowledge Discovery from Data (TKDD)
Disclosure Risks of Distance Preserving Data Transformations

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Accurate Synthetic Generation of Realistic Personal Information

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Hybrid Approach to Private Record Linkage

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Towards an information theoretic metric for anonymity

PET'02 Proceedings of the 2nd international conference on Privacy enhancing technologies
Towards measuring anonymity

PET'02 Proceedings of the 2nd international conference on Privacy enhancing technologies
Measuring anonymity with relative entropy

FAST'06 Proceedings of the 4th international conference on Formal aspects in security and trust
An attacker's view of distance preserving maps for privacy preserving data mining

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

"Better than nothing" privacy with bloom filters: to what extent?

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Efficient privacy-aware record integration

Proceedings of the 16th International Conference on Extending Database Technology
A taxonomy of privacy-preserving record linkage techniques

Information Systems
Efficient two-party private blocking based on sorted nearest neighborhood clustering

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An iterative two-party protocol for scalable privacy-preserving record linkage

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

For over fifty years, "record linkage" procedures have been refined to integrate data in the face of typographical and semantic errors. These procedures are traditionally performed over personal identifiers (e.g., names), but in modern decentralized environments, privacy concerns have led to regulations that require the obfuscation of such attributes. Various techniques have been proposed to resolve the tension, including secure multi-party computation protocols, however, such protocols are computationally intensive and do not scale for real world linkage scenarios. More recently, procedures based on Bloom filter encoding (BFE) have gained traction in various applications, such as healthcare, where they yield highly accurate record linkage results in a reasonable amount of time. Though promising, no formal security analysis has been designed or applied to this emerging model, which is of concern considering the sensitivity of the corresponding data. In this paper, we introduce a novel attack, based on constraint satisfaction, to provide a rigorous analysis for BFE and guidelines regarding how to mitigate risk against the attack. In addition, we conduct an empirical analysis with data derived from public voter records to illustrate the feasibility of the attack. Our investigations show that the parameters of the BFE protocol can be configured to make it relatively resilient to the proposed attack without significant reduction in record linkage performance.