The complexity of regular expressions and property paths in SPARQL

  • Authors:
  • Katja Losemann;Wim Martens

  • Affiliations:
  • University of Bayreuth, Bayreuth, Germany;University of Bayreuth, Bayreuth, Germany

  • Venue:
  • ACM Transactions on Database Systems (TODS) - Invited papers issue
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a nonstandard manner. We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of: (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately. As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.