Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Blocking-aware private record linkage
Proceedings of the 2nd international workshop on Information quality in information systems
Privacy preserving schema and data matching
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Adaptive sorted neighborhood methods for efficient record linkage
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Transforming semi-honest protocols to ensure accountability
Data & Knowledge Engineering
On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge
ACM Transactions on Knowledge Discovery from Data (TKDD)
Disclosure Risks of Distance Preserving Data Transformations
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Accurate Synthetic Generation of Realistic Personal Information
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Hybrid Approach to Private Record Linkage
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Private Record Linkage
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Security Against Covert Adversaries: Efficient Protocols for Realistic Adversaries
Journal of Cryptology
Private record matching using differential privacy
Proceedings of the 13th International Conference on Extending Database Technology
Differential privacy: a survey of results
TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
Privacy-preserving record linkage
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
A constraint satisfaction cryptanalysis of bloom filters in private record linkage
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Information fusion in data privacy: A survey
Information Fusion
Fake injection strategies for private phonetic matching
DPM'11 Proceedings of the 6th international conference, and 4th international conference on Data Privacy Management and Autonomous Spontaneus Security
Reference table based k-anonymous private blocking
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Adaptive Windows for Duplicate Detection
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
IEEE Transactions on Knowledge and Data Engineering
Frequent grams based embedding for privacy preserving record linkage
Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient privacy-aware record integration
Proceedings of the 16th International Conference on Extending Database Technology
A Sorted Neighborhood Approach to Multidimensional Privacy Preserving Blocking
ICDMW '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops
A taxonomy of privacy-preserving record linkage techniques
Information Systems
An efficient two-party protocol for approximate matching in private record linkage
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
An iterative two-party protocol for scalable privacy-preserving record linkage
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
Integrating data from diverse sources with the aim to identify similar records that refer to the same real-world entities without compromising privacy of these entities is an emerging research problem in various domains. This problem is known as privacy-preserving record linkage (PPRL). Scalability of PPRL is a main challenge due to growing data size in real-world applications. Private blocking techniques have been used in PPRL to address this challenge by reducing the number of record pair comparisons that need to be conducted. Many of these private blocking techniques require a trusted third party to perform the blocking. One main threat with three-party solutions is the collusion between parties to identify the private data of another party. We introduce a novel two-party private blocking technique for PPRL based on sorted nearest neighborhood clustering. Privacy is addressed by a combination of the privacy techniques k-anonymous clustering and public reference values. Experiments conducted on two real-world databases validate that our approach is scalable to large databases and effective in generating candidate record pairs that correspond to true matches, while preserving k-anonymous privacy characteristics. Our approach also performs equal or superior compared to three other state-of-the-art private blocking techniques in terms of scalability, blocking quality, and privacy. It can achieve private blocking up-to two magnitudes faster than other state-of-the art private blocking approaches.