Record linkage performance for large data sets

  • Authors:
  • Jordi Gómez-Bao;Josep-L. Larriba-Pey;Josepa Ribes Puig

  • Affiliations:
  • Universitat Politècnica de Catalunya, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain;Pla Director d'Oncologia de Catalunya, l'Hospitalet de Llogregat, Spain

  • Venue:
  • Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose new data structures to speed up Record Linkage that take advantage of the value distribution of usual string attributes, like name or surname. Using some additional memory, we increase the processing speed by almost an order of magnitude without losing recall or precision at all. The improvement achieved is independent from the methods used for reducing the number of record comparisons, like Blocking or Sliding Window, and the specific string comparison functions.