H2RDF: adaptive query processing on RDF data in the cloud.

  • Authors:
  • Nikolaos Papailiou;Ioannis Konstantinou;Dimitrios Tsoumakos;Nectarios Koziris

  • Affiliations:
  • Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece;Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece;Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece;Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece

  • Venue:
  • Proceedings of the 21st international conference companion on World Wide Web
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we present H2RDF, a fully distributed RDF store that combines the MapReduce processing framework with a NoSQL distributed data store. Our system features two unique characteristics that enable efficient processing of both simple and multi-join SPARQL queries on virtually unlimited number of triples: Join algorithms that execute joins according to query selectivity to reduce processing; and adaptive choice among centralized and distributed (MapReduce-based) join execution for fast query responses. Our system efficiently answers both simple joins and complex multivariate queries and easily scales to 3 billion triples using a small cluster of 9 worker nodes. H2RDF outperforms state-of-the-art distributed solutions in multi-join and nonselective queries while achieving comparable performance to centralized solutions in selective queries. In this demonstration we showcase the system's functionality through an interactive GUI. Users will be able to execute predefined or custom-made SPARQL queries on datasets of different sizes, using different join algorithms. Moreover, they can repeat all queries utilizing a different number of cluster resources. Using real-time cluster monitoring and detailed statistics, participants will be able to understand the advantages of different execution schemes versus the input data as well as the scalability properties of H2RDF over both the data size and the available worker resources.