Keys and pseudo-keys detection for web datasets cleansing and interlinking

  • Authors:
  • Manuel Atencia;Jérôme David;François Scharffe

  • Affiliations:
  • INRIA & LIG, France,Université de Grenoble 1, France;INRIA & LIG, France,Université de Grenoble 2, France;Université de Montpellier 2 & LIRMM, France

  • Venue:
  • EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.