Efficient processing of RDF graph pattern matching on MapReduce platforms

  • Authors:
  • Padmashree Ravindra;Seokyong Hong;HyeongSik Kim;Kemafor Anyanwu

  • Affiliations:
  • North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA

  • Venue:
  • Proceedings of the second international workshop on Data intensive computing in the clouds
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Broadened adoption of the Linking Open Data tenets has led to a significant surge in the amount of Semantic Web data, particularly RDF data. This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic Web research community. The RDF data model is a fine-grained model representing relationships as binary relations. Thus, answering queries (typically graph pattern matching queries) over RDF data requires several join operations to reassemble related data. While MapReduce based processing is emerging as the de facto paradigm for processing large scale data, it is known to be inefficient for join-intensive workloads. In addition, most of the existing techniques for optimizing RDF data processing do not transfer well to the MapReduce model and often require significant lead time for pre-processing. Such a requirement may not be desirable for on-demand cloud database scenarios where the goal is to reduce the Time-To-Result (TTR). In this position paper, we argue that some of these challenges can be overcome by rethinking the operators for graph pattern processing, as well as adopting dynamic optimization techniques that exploit information from the previous execution steps to eliminate intermediate results that are irrelevant in the context of future execution steps. We present some preliminary evaluation results of the proposed techniques.