SPARQL basic graph pattern processing with iterative MapReduce

  • Authors:
  • Jaeseok Myung;Jongheum Yeon;Sang-goo Lee

  • Affiliations:
  • Seoul National University, Seoul, Republic of Korea;Seoul National University, Seoul, Republic of Korea;Seoul National University, Seoul, Republic of Korea

  • Venue:
  • Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

There have been a number of approaches to adopt the RDF data model and the MapReduce framework for a data warehouse, as the data model is suitable for data integration and the data processing framework is good for large-scale fault-tolerant data analyses. Nevertheless, most approaches consider the data model and the framework separately. It has been difficult to create synergy because there have been only a few algorithms which connects the data model and the framework. In this paper, we offer a general and efficient MapReduce algorithm for SPARQL Basic Graph Pattern which is a set of triple patterns to be joined. In a MapReduce world, it is known that the join operation requires computationally expensive MapReduce iterations. For this reason, we minimize the number of iterations with the followings. First, we adopt traditional multi-way join into MapReduce instead of multiple individual joins. Second, by analyzing a given query, we select a good join-key to avoid unnecessary iterations. As a result, the algorithm shows good performance and scalability in terms of time and data size.