Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

  • Authors:
  • Kurt Rohloff;Richard E. Schantz

  • Affiliations:
  • BBN Technologies, Cambridge, MA, USA;BBN Technologies, Cambridge, MA, USA

  • Venue:
  • Proceedings of the fourth international workshop on Data-intensive distributed computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.