Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins

Authors:
Thomas Neumann;Guido Moerkotte
Affiliations:
Technische Universität München, Munich, Germany;Universität Mannheim, Germany
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 7

Efficient resource attribute retrieval in RDF triple stores

Proceedings of the 20th ACM international conference on Information and knowledge management
View selection in Semantic Web databases

Proceedings of the VLDB Endowment
Heuristics-based query optimisation for SPARQL

Proceedings of the 15th International Conference on Extending Database Technology
Efficient distributed query processing for autonomous RDF databases

Proceedings of the 15th International Conference on Extending Database Technology
SPLODGE: systematic generation of SPARQL benchmark queries for linked open data

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Hybrid query execution engine for large attributed graphs

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate cardinality estimates are essential for a successful query optimization. This is not only true for relational DBMSs but also for RDF stores. An RDF database consists of a set of triples and, hence, can be seen as a relational database with a single table with three attributes. This makes RDF rather special in that queries typically contain many self joins. We show that relational DBMSs are not well-prepared to perform cardinality estimation in this context. Further, there are hardly any special cardinality estimation methods for RDF databases. To overcome this lack of appropriate cardinality estimation methods, we introduce characteristic sets together with new cardinality estimation methods based upon them. We then show experimentally that the new methods are-in the RDF context-highly superior to the estimation methods employed by commercial DBMSs and by the open-source RDF store RDF-3X.