Sparqling kleene: fast property paths in RDF-3X

  • Authors:
  • Andrey Gubichev;Srikanta J. Bedathur;Stephan Seufert

  • Affiliations:
  • Technische Universität München, Germany;IIIT-D, New Delhi, India;Max Planck Institute for Informatics, Germany

  • Venue:
  • First International Workshop on Graph Data Management Experiences and Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

As Semantic Web efforts continue to gather steam, the RDF engines are faced with graphs with millions of nodes and billions of edges. While much recent work in addressing the resulting scalability issues in processing queries over these datasets have mainly considered SPARQL 1.0, the next-generation query language recommendations have proposed the addition of regular expression restricted navigation queries into SPARQL. We address the problem of supporting efficient processing of property paths into RDF-3X -- a high-performance RDF engine. In this paper, we restrict our attention to a restricted definition of property paths that is not only tractable but also most commonly used -- instead of enumerating all paths that satisfy the given query, we focus on regular expression based reachability queries. Based on this, we make the following three major technical contributions: first, we present a detailed account of integrating the recently proposed highly compact reachability index called FERRARI into the RDF-3X engine to support property path evaluation; second, we show how property path queries can be efficiently answered using multiple instances of this index -- one instance for each distinct label in the graph; and finally, we develop a set of queries over real-world RDF data that can serve as benchmark set for evaluating the efficiency of property path queries. Our experimental results over Yago2, a large RDF-based knowledge base, show that our proposed approach is highly scalable and flexible.