Efficient two-party private blocking based on sorted nearest neighborhood clustering

Authors:
Dinusha Vatsalan;Peter Christen;Vassilios S. Verykios
Affiliations:
The Australian National University, Canberra, Australia;The Australian National University, Canberra, Australia;Hellenic Open University, Patras, Greece
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 28
Cited 0

Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Privacy-preserving data integration and sharing

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Blocking-aware private record linkage

Proceedings of the 2nd international workshop on Information quality in information systems
Privacy preserving schema and data matching

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Adaptive sorted neighborhood methods for efficient record linkage

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Transforming semi-honest protocols to ensure accountability

Data & Knowledge Engineering
On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge

ACM Transactions on Knowledge Discovery from Data (TKDD)
Disclosure Risks of Distance Preserving Data Transformations

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Accurate Synthetic Generation of Realistic Personal Information

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Hybrid Approach to Private Record Linkage

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Private Record Linkage

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Security Against Covert Adversaries: Efficient Protocols for Realistic Adversaries

Journal of Cryptology
Private record matching using differential privacy

Proceedings of the 13th International Conference on Extending Database Technology
Differential privacy: a survey of results

TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
Privacy-preserving record linkage

PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
A constraint satisfaction cryptanalysis of bloom filters in private record linkage

PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Information fusion in data privacy: A survey

Information Fusion
Fake injection strategies for private phonetic matching

DPM'11 Proceedings of the 6th international conference, and 4th international conference on Data Privacy Management and Autonomous Spontaneus Security
Reference table based k-anonymous private blocking

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Adaptive Windows for Duplicate Detection

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication

IEEE Transactions on Knowledge and Data Engineering
Frequent grams based embedding for privacy preserving record linkage

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient privacy-aware record integration

Proceedings of the 16th International Conference on Extending Database Technology
A Sorted Neighborhood Approach to Multidimensional Privacy Preserving Blocking

ICDMW '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops
A taxonomy of privacy-preserving record linkage techniques

Information Systems
An efficient two-party protocol for approximate matching in private record linkage

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
An iterative two-party protocol for scalable privacy-preserving record linkage

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

Integrating data from diverse sources with the aim to identify similar records that refer to the same real-world entities without compromising privacy of these entities is an emerging research problem in various domains. This problem is known as privacy-preserving record linkage (PPRL). Scalability of PPRL is a main challenge due to growing data size in real-world applications. Private blocking techniques have been used in PPRL to address this challenge by reducing the number of record pair comparisons that need to be conducted. Many of these private blocking techniques require a trusted third party to perform the blocking. One main threat with three-party solutions is the collusion between parties to identify the private data of another party. We introduce a novel two-party private blocking technique for PPRL based on sorted nearest neighborhood clustering. Privacy is addressed by a combination of the privacy techniques k-anonymous clustering and public reference values. Experiments conducted on two real-world databases validate that our approach is scalable to large databases and effective in generating candidate record pairs that correspond to true matches, while preserving k-anonymous privacy characteristics. Our approach also performs equal or superior compared to three other state-of-the-art private blocking techniques in terms of scalability, blocking quality, and privacy. It can achieve private blocking up-to two magnitudes faster than other state-of-the art private blocking approaches.