LINDA: distributed web-of-data-scale entity matching

  • Authors:
  • Christoph Böhm;Gerard de Melo;Felix Naumann;Gerhard Weikum

  • Affiliations:
  • Hasso Plattner Institute, Potsdam, Germany;ICSI Berkeley, Berkeley, CA, USA;Hasso Plattner Institute, Potsdam, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Linked Data has emerged as a powerful way of interconnecting structured data on the Web. However, the cross-linkage between Linked Data sources is not as extensive as one would hope for. In this paper, we formalize the task of automatically creating "sameAs" links across data sources in a globally consistent manner. Our algorithm, presented in a multi-core as well as a distributed version, achieves this link generation by accounting for joint evidence of a match. Experiments confirm that our system scales beyond 100 million entities and delivers highly accurate results despite the vast heterogeneity and daunting scale.