Efficient SPARQL query processing in mapreduce through data partitioning and indexing

Authors:
Zhi Nie;Fang Du;Yueguo Chen;Xiaoyong Du;Linhao Xu
Affiliations:
Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China;Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;IBM Research China, Beijing, China
Venue:
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Year:
2012

Citing 13
Cited 0

Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data

Proceedings of the 18th ACM conference on Information and knowledge management
Scalable Distributed Reasoning Using MapReduce

ISWC '09 Proceedings of the 8th International Semantic Web Conference
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools

CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Partitioned indexes for entity search over RDF knowledge bases

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Processing SPARQL queries on single node is obviously not scalable, considering the rapid growth of RDF knowledge bases. This calls for scalable solutions of SPARQL query processing over Web-scale RDF data. There have been attempts for applying SPARQL query processing techniques in MapReduce environments. However, no study has been conducted on finding optimal partitioning and indexing schemes for distributing RDF data in MapReduce. In this paper, we investigate RDF data partitioning technique that provides effective indexing schemes to support efficient SPARQL query processing in MapReduce. Our extensive experiments over a huge real-life RDF dataset show the performance of the proposed partitioning and indexing schemes for efficient SPARQL query processing.