Identifying candidate datasets for data interlinking

  • Authors:
  • Luiz André P. Paes Leme;Giseli Rabello Lopes;Bernardo Pereira Nunes;Marco Antonio Casanova;Stefan Dietze

  • Affiliations:
  • Computer Science Institute, Fluminense Federal University, Niterói, RJ, Brazil;Department of Informatics, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil;Department of Informatics, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil,L3S Research Center, Leibniz University Hannover, Hannover, Germany;Department of Informatics, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil;L3S Research Center, Leibniz University Hannover, Hannover, Germany

  • Venue:
  • ICWE'13 Proceedings of the 13th international conference on Web Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the design principles that can stimulate the growth and increase the usefulness of the Web of data is URIs linkage. However, the related URIs are typically in different datasets managed by different publishers. Hence, the designer of a new dataset must be aware of the existing datasets and inspect their content to define sameAs links. This paper proposes a technique based on probabilistic classifiers that, given a datasets S to be published and a set T of known published datasets, ranks each Ti ∈ T according to the probability that links between S and Ti can be found by inspecting the most relevant datasets. Results from our technique show that the search space can be reduced up to 85%, thereby greatly decreasing the computational effort.