Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Anonymity, unobservability, and pseudeonymity — a proposal for terminology
International workshop on Designing privacy enhancing technologies: design issues in anonymity and unobservability
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Foundations of Cryptography: Volume 2, Basic Applications
Foundations of Cryptography: Volume 2, Basic Applications
Secure and private sequence comparisons
Proceedings of the 2003 ACM workshop on Privacy in the electronic society
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Random-data perturbation techniques and privacy-preserving data mining
Knowledge and Information Systems
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Privacy Preserving Data Classification with Rotation Perturbation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Secure multiparty computation of approximations
ACM Transactions on Algorithms (TALG)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge
ACM Transactions on Knowledge Discovery from Data (TKDD)
Disclosure Risks of Distance Preserving Data Transformations
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Accurate Synthetic Generation of Realistic Personal Information
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Hybrid Approach to Private Record Linkage
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Towards an information theoretic metric for anonymity
PET'02 Proceedings of the 2nd international conference on Privacy enhancing technologies
PET'02 Proceedings of the 2nd international conference on Privacy enhancing technologies
Measuring anonymity with relative entropy
FAST'06 Proceedings of the 4th international conference on Formal aspects in security and trust
An attacker's view of distance preserving maps for privacy preserving data mining
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
"Better than nothing" privacy with bloom filters: to what extent?
PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
Efficient privacy-aware record integration
Proceedings of the 16th International Conference on Extending Database Technology
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Efficient two-party private blocking based on sorted nearest neighborhood clustering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An iterative two-party protocol for scalable privacy-preserving record linkage
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
For over fifty years, "record linkage" procedures have been refined to integrate data in the face of typographical and semantic errors. These procedures are traditionally performed over personal identifiers (e.g., names), but in modern decentralized environments, privacy concerns have led to regulations that require the obfuscation of such attributes. Various techniques have been proposed to resolve the tension, including secure multi-party computation protocols, however, such protocols are computationally intensive and do not scale for real world linkage scenarios. More recently, procedures based on Bloom filter encoding (BFE) have gained traction in various applications, such as healthcare, where they yield highly accurate record linkage results in a reasonable amount of time. Though promising, no formal security analysis has been designed or applied to this emerging model, which is of concern considering the sensitivity of the corresponding data. In this paper, we introduce a novel attack, based on constraint satisfaction, to provide a rigorous analysis for BFE and guidelines regarding how to mitigate risk against the attack. In addition, we conduct an empirical analysis with data derived from public voter records to illustrate the feasibility of the attack. Our investigations show that the parameters of the BFE protocol can be configured to make it relatively resilient to the proposed attack without significant reduction in record linkage performance.