Efficient two-party private blocking based on sorted nearest neighborhood clustering

  • Authors:
  • Dinusha Vatsalan;Peter Christen;Vassilios S. Verykios

  • Affiliations:
  • The Australian National University, Canberra, Australia;The Australian National University, Canberra, Australia;Hellenic Open University, Patras, Greece

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Integrating data from diverse sources with the aim to identify similar records that refer to the same real-world entities without compromising privacy of these entities is an emerging research problem in various domains. This problem is known as privacy-preserving record linkage (PPRL). Scalability of PPRL is a main challenge due to growing data size in real-world applications. Private blocking techniques have been used in PPRL to address this challenge by reducing the number of record pair comparisons that need to be conducted. Many of these private blocking techniques require a trusted third party to perform the blocking. One main threat with three-party solutions is the collusion between parties to identify the private data of another party. We introduce a novel two-party private blocking technique for PPRL based on sorted nearest neighborhood clustering. Privacy is addressed by a combination of the privacy techniques k-anonymous clustering and public reference values. Experiments conducted on two real-world databases validate that our approach is scalable to large databases and effective in generating candidate record pairs that correspond to true matches, while preserving k-anonymous privacy characteristics. Our approach also performs equal or superior compared to three other state-of-the-art private blocking techniques in terms of scalability, blocking quality, and privacy. It can achieve private blocking up-to two magnitudes faster than other state-of-the art private blocking approaches.