Efficient execution of top-k SPARQL queries

  • Authors:
  • Sara Magliacane;Alessandro Bozzon;Emanuele Della Valle

  • Affiliations:
  • Politecnico of Milano, Milano, Italy,VU University Amsterdam, The Netherlands;Politecnico of Milano, Milano, Italy;Politecnico of Milano, Milano, Italy

  • Venue:
  • ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Top-k queries, i.e. queries returning the top k results ordered by a user-defined scoring function, are an important category of queries. Order is an important property of data that can be exploited to speed up query processing. State-of-the-art SPARQL engines underuse order, and top-k queries are mostly managed with a materialize-then-sort processing scheme that computes all the matching solutions (e.g. thousands) even if only a limited number k (e.g. ten) are requested. The $\mathcal{S}$PARQL-$\mathcal{R}$ANK algebra is an extended SPARQL algebra that treats order as a first class citizen, enabling efficient split-and-interleave processing schemes that can be adopted to improve the performance of top-k SPARQL queries. In this paper we propose an incremental execution model for $\mathcal{S}$PARQL-$\mathcal{R}$ANK queries, we compare the performance of alternative physical operators, and we propose a rank-aware join algorithm optimized for native RDF stores. Experiments conducted with an open source implementation of a $\mathcal{S}$PARQL-$\mathcal{R}$ANK query engine based on ARQ show that the evaluation of top-k queries can be sped up by orders of magnitude.