The logic of RDF and SPARQL: a tutorial

  • Authors:
  • Enrico Franconi;Sergio Tessaris

  • Affiliations:
  • Free University of Bozen-Bolzano, Italy;Free University of Bozen-Bolzano, Italy

  • Venue:
  • Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Resource Description Framework (RDF [Hayes, 2004]) is a W3C standard language for representing information about resources in the World Wide Web; RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. In this tutorial, RDF will be presented as a data model in the database sense. Its motivations will be analysed, and its current formal status revised. The data model can be understood both from a graph theoretical perspective and from a logical perspective. While the former has been the focus of most theoretical (see, e.g., [Gutierrez et al., 2004]) and practical approaches to RDF, the logical view of RDF has been mostly neglected by the community so far. Two provably correct (w.r.t. the normative W3C definitions of RDF [Hayes, 2004]) logical reconstructions of RDF will be presented, by reducing (a fragment of) it to a classical first-order framework suitable for knowledge representation (first developed in [de Bruijn et al., 2005]), and by encoding the full RDF data model in the HiLog logic introduced by Kifer et al. several years ago [Chen et al., 1993]. An emphasis will be given to three main characteristics of RDF: the presence of anonymous bnodes, the non-well-foundedness of the basic rdf:type relation, and the presence of the RDF vocabulary in the mode itself.In the second part of the tutorial, the relation of the logical reconstructions of RDF with a database perspective will be introduced. An RDF database is seen as a model of a suitable theory in first order logic or in HiLog. While in the pure RDF sense the two approaches are equivalent, it will be shown how the difference becomes relevant whenever additional constraints (e.g., in the form of ontologies or database dependencies) are introduced in the framework. In order to allow for additional constraints (e.g., in the standard W3C OWL-DL ontology language [Patel-Schneider et al., 2004]) while keeping the framework first order, only a fragment of RDF can be considered; this restriction is not needed if the framework is in HiLog (see, e.g., [Motik, 2005]). Various complexity and decidability results will be summarised. In the last part of the tutorial, the W3C standard query language for RDF (SPARQL [Prud'hommeaux and Seaborne, 2006]) will be presented. SPARQL is currently a candidate recommendation. The core of SPARQL is a conjunctive query language, with the added complication that the data model includes existential information in the form of bnodes, and that bnodes may be returned by the query. The formal semantics of the core query language will be given. The problem of the canonical representation of the answer set will be introduced, since bnodes introduce a behaviour similar to the null values in SQL. Complexity results for query answering will be given for different cases. Finally, the possible extensions of SPARQL with various classes of constraints will be discussed.