Keys and pseudo-keys detection for web datasets cleansing and interlinking

Authors:
Manuel Atencia;Jérôme David;François Scharffe
Affiliations:
INRIA & LIG, France,Université de Grenoble 1, France;INRIA & LIG, France,Université de Grenoble 2, France;Université de Montpellier 2 & LIRMM, France
Venue:
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Year:
2012

Citing 6
Cited 2

Algorithms for inferring functional dependencies from relations

Data & Knowledge Engineering
Ontology Matching

Ontology Matching
Automatically generating data linkages using a domain-independent candidate selection approach

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Detecting Abnormal Semantic Web Data Using Semantic Dependency

ICSC '11 Proceedings of the 2011 IEEE Fifth International Conference on Semantic Computing
KD2R: a key discovery method for semantic reference reconciliation

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
Data Linking for the Semantic Web

International Journal on Semantic Web & Information Systems

Discovering keys in RDF/OWL dataset with KD2R

Proceedings of the 2nd International Workshop on Open Data
An automatic key discovery approach for data linking

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.