Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Authors:
Kurt Rohloff;Richard E. Schantz
Affiliations:
BBN Technologies, Cambridge, MA, USA;BBN Technologies, Cambridge, MA, USA
Venue:
Proceedings of the fourth international workshop on Data-intensive distributed computing
Year:
2011

Citing 13
Cited 2

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Web Semantics in the Clouds

IEEE Intelligent Systems
The Quest for Parallel Reasoning on the Semantic Web

AMT '09 Proceedings of the 5th International Conference on Active Media Technology
Scalable Distributed Reasoning Using MapReduce

ISWC '09 Proceedings of the 8th International Semantic Web Conference
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Web 3.0: The Dawn of Semantic Search

Computer
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
An evaluation of triple-store technologies for large data stores

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Design patterns for efficient graph algorithms in MapReduce

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

Programming Support Innovations for Emerging Distributed Applications
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

IEEE Transactions on Knowledge and Data Engineering
OWL reasoning with WebPIE: calculating the closure of 100 billion triples

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I

Efficient data partitioning model for heterogeneous graphs in the cloud

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scaling queries over big RDF graphs with semantic hash partitioning

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.01

Visualization

Abstract

Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.