Efficient distributed query processing for autonomous RDF databases

Authors:
Fabian Prasser;Alfons Kemper;Klaus A. Kuhn
Affiliations:
Technische Universität München, Garching, Germany, and Technische Universität München, University Hospital (Klinikum rechts der Isar), München, Germany;Technische Universität München, Garching, Germany;Technische Universität München, University Hospital (Klinikum rechts der Isar), München, Germany
Venue:
Proceedings of the 15th International Conference on Extending Database Technology
Year:
2012

Citing 17
Cited 1

The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Integrating Semi-Join-Reducers into State of the Art Query Processors

Proceedings of the 17th International Conference on Data Engineering
Index structures and algorithms for querying distributed RDF repositories

Proceedings of the 13th international conference on World Wide Web
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bio2RDF: Towards a mashup to build bioinformatics knowledge systems

Journal of Biomedical Informatics
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
Executing SPARQL Queries over the Web of Linked Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Data summaries for on-demand queries over linked data

Proceedings of the 19th international conference on World wide web
A semantic web middleware for virtual data integration on the web

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Querying distributed RDF data sources with SPARQL

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Linked data query processing strategies

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Using reformulation trees to optimize queries over distributed heterogeneous sources

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
gStore: answering SPARQL queries via subgraph matching

Proceedings of the VLDB Endowment
A new, highly efficient, and easy to implement top-down join enumeration algorithm

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Structure inference for linked data sources using clustering

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

The inherent flexibility of the RDF data model has led to its notable adoption in many domains, especially in the area of life-sciences. Some of these domains have an emerging need to access data integrated from various distributed sources of information. It is not always possible to implement this by simply loading all data into one central RDF store. For example, in the context of inter-institutional collaboration for drug development and clinical research participants often want to maintain control over their local databases. Alternatively, distributed query processing techniques can be utilized to evaluate queries by accessing the remote data sources only on demand and in conformance with local authorization models. In this paper we present an efficient approach to distributed query processing for large autonomous RDF databases. The groundwork is laid by a comprehensive RDF-specific schema- and instance-level synopsis. We present an optimizer that is able to utilize this synopsis to generate compact execution plans by precisely determining, at compile-time, those sources that are relevant to a query. Furthermore we present a tightly integrated query engine that is able to further reduce the volume of intermediate results at run-time. An extensive evaluation shows that our approach improves query execution times by up to two and transferred data volumes by up to three orders of magnitude compared to a naïve implementation.