H2RDF: adaptive query processing on RDF data in the cloud.

Authors:
Nikolaos Papailiou;Ioannis Konstantinou;Dimitrios Tsoumakos;Nectarios Koziris
Affiliations:
Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece;Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece;Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece;Computing Systems Laboratory, School of ECE, National Technical University of Athens, Athens, Greece
Venue:
Proceedings of the 21st international conference companion on World Wide Web
Year:
2012

Citing 11
Cited 4

Jena: implementing the semantic web recommendations

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
SW-Store: a vertically partitioned DBMS for Semantic Web data management

The VLDB Journal — The International Journal on Very Large Data Bases
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
Data summaries for on-demand queries over linked data

Proceedings of the 19th international conference on World wide web
An evaluation of approaches to federated query processing over linked data

Proceedings of the 6th International Conference on Semantic Systems
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

IEEE Transactions on Knowledge and Data Engineering
OWLIM – a pragmatic semantic repository for OWL

WISE'05 Proceedings of the 2005 international conference on Web Information Systems Engineering

Automatic scaling of selective SPARQL joins using the TIRAMOLA system

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Towards big linked data: a large-scale, distributed semantic data storage

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Demonstrating intelligent crawling and archiving of web applications

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Semantic-based QoS management in cloud systems: Current status and future challenges

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we present H2RDF, a fully distributed RDF store that combines the MapReduce processing framework with a NoSQL distributed data store. Our system features two unique characteristics that enable efficient processing of both simple and multi-join SPARQL queries on virtually unlimited number of triples: Join algorithms that execute joins according to query selectivity to reduce processing; and adaptive choice among centralized and distributed (MapReduce-based) join execution for fast query responses. Our system efficiently answers both simple joins and complex multivariate queries and easily scales to 3 billion triples using a small cluster of 9 worker nodes. H2RDF outperforms state-of-the-art distributed solutions in multi-join and nonselective queries while achieving comparable performance to centralized solutions in selective queries. In this demonstration we showcase the system's functionality through an interactive GUI. Users will be able to execute predefined or custom-made SPARQL queries on datasets of different sizes, using different join algorithms. Moreover, they can repeat all queries utilizing a different number of cluster resources. Using real-time cluster monitoring and detailed statistics, participants will be able to understand the advantages of different execution schemes versus the input data as well as the scalability properties of H2RDF over both the data size and the available worker resources.