A distributed framework for scaling Up LSH-based computations in privacy preserving record linkage

  • Authors:
  • Dimitrios Karapiperis;Vassilios S. Verykios

  • Affiliations:
  • Hellenic Open University, Patras, Greece;Hellenic Open University, Patras, Greece

  • Venue:
  • Proceedings of the 6th Balkan Conference in Informatics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Privacy Preserving Record Linkage (PPRL) is the scientific field that explores methods of linking datasets in order to identify common entities efficiently and accurately by simultaneously preserving the privacy of the underlying data. In this paper we present a distributed Locality Sensitive Hashing-based framework for linking huge collections of records, by grouping similar records efficiently and by distributing computations among underutilized commodity hardware resources uniformly, without imposing an extra overhead on the existing infrastructure, thus promoting scalability. We also propose two methods of assessing computational cost, aiming to distribute workload evenly among compute nodes.