Efficient SPARQL query processing in mapreduce through data partitioning and indexing

  • Authors:
  • Zhi Nie;Fang Du;Yueguo Chen;Xiaoyong Du;Linhao Xu

  • Affiliations:
  • Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China;Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;IBM Research China, Beijing, China

  • Venue:
  • APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Processing SPARQL queries on single node is obviously not scalable, considering the rapid growth of RDF knowledge bases. This calls for scalable solutions of SPARQL query processing over Web-scale RDF data. There have been attempts for applying SPARQL query processing techniques in MapReduce environments. However, no study has been conducted on finding optimal partitioning and indexing schemes for distributing RDF data in MapReduce. In this paper, we investigate RDF data partitioning technique that provides effective indexing schemes to support efficient SPARQL query processing in MapReduce. Our extensive experiments over a huge real-life RDF dataset show the performance of the proposed partitioning and indexing schemes for efficient SPARQL query processing.