Heuristics-based query optimisation for SPARQL

Authors:
Petros Tsialiamanis;Lefteris Sidirourgos;Irini Fundulaki;Vassilis Christophides;Peter Boncz
Affiliations:
ICS-FORTH, Heraklion, Greece;CWI, Amsterdam, the Netherlands;ICS-FORTH, Heraklion, Greece;ICS-FORTH, Heraklion, Greece;CWI, Amsterdam, the Netherlands
Venue:
Proceedings of the 15th International Conference on Extending Database Technology
Year:
2012

Citing 26
Cited 2

Principles of database and knowledge-base systems, Vol. I

Principles of database and knowledge-base systems, Vol. I
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A new algorithm for the maximum-weight clique problem

Nordic Journal of Computing
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient SQL-based RDF querying scheme

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
The SPARQL Query Graph Model for Query Optimization

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
SW-Store: a vertically partitioned DBMS for Semantic Web data management

The VLDB Journal — The International Journal on Very Large Data Bases
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Media Meets Semantic Web --- How the BBC Uses DBpedia and Linked Data to Make Connections

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
GRIN: a graph based RDF index

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
HPRD: a high performance RDF database

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Foundations of SPARQL query optimization

Proceedings of the 13th International Conference on Database Theory
Apples and oranges: a comparison of RDF benchmarks and real RDF datasets

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

IEEE Transactions on Knowledge and Data Engineering
Benchmarking database representations of RDF/S stores

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Efficiently joining group patterns in SPARQL queries

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I

Diachronic linked data: towards long-term preservation of structured interrelated information

Proceedings of the First International Workshop on Open Data
Building an efficient RDF store over a relational database

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-of-the-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks.