Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage

  • Authors:
  • Elizabeth Durham;Yuan Xue;Murat Kantarcioglu;Bradley Malin

  • Affiliations:
  • Department of Biomedical Informatics, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203, USA;Department of Electrical Engineering & Computer Science, Vanderbilt University, 400 24th Avenue South, Nashville, TN 37212, USA;Department of Computer Science, University of Texas at Dallas, 2601 North Floyd Road, Richardson, TX 75083, USA;Department of Biomedical Informatics, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203, USA and Department of Electrical Engineering & Computer Science, Vanderbilt University, 400 ...

  • Venue:
  • Information Fusion
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Record linkage is the task of identifying records from disparate data sources that refer to the same entity. It is an integral component of data processing in distributed settings, where the integration of information from multiple sources can prevent duplication and enrich overall data quality, thus enabling more detailed and correct analysis. Privacy-preserving record linkage (PPRL) is a variant of the task in which data owners wish to perform linkage without revealing identifiers associated with the records. This task is desirable in various domains, including healthcare, where it may not be possible to reveal patient identity due to confidentiality requirements, and in business, where it could be disadvantageous to divulge customers' identities. To perform PPRL, it is necessary to apply string comparators that function in the privacy-preserving space. A number of privacy-preserving string comparators (PPSCs) have been proposed, but little research has compared them in the context of a real record linkage application. This paper performs a principled and comprehensive evaluation of six PPSCs in terms of three key properties: (1) correctness of record linkage predictions, (2) computational complexity, and (3) security. We utilize a real publicly-available dataset, derived from the North Carolina voter registration database, to evaluate the tradeoffs between the aforementioned properties. Among our results, we find that PPSCs that partition, encode, and compare strings yield highly accurate record linkage results. However, as a tradeoff, we observe that such PPSCs are less secure than those that map and compare strings in a reduced dimensional space.