Scalable RDF graph querying using cloud computing

  • Authors:
  • Ren Li;Dan Yang;Haibo Hu;Juan Xie;Li Fu

  • Affiliations:
  • College of Computer Science, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China

  • Venue:
  • Journal of Web Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the explosion of the semantic web technologies, conventional SPARQL processing tools do not scale well for large amounts of RDF data because they are designed for use on a single-machine context. Several optimization solutions combined with cloud computing technologies have been proposed to overcome these drawbacks. However, these approaches only consider the SPARQL Basic Graph Pattern processing, and their file system-based schema can barely modify large-scale RDF data randomly. This paper presents a scalable SPARQL Group Graph Pattern (GGP) processing framework for large RDF graphs. We design a novel storage schema on HBase to store RDF data. Furthermore, a query plan generation algorithm is proposed to determine jobs based on a greedy selection strategy. Several query algorithms are also presented to answer SPARQL GGP queries in the MapReduce paradigm. An experiment on a simulation cloud computing environment shows that our framework is more scalable and efficient than traditional approaches when storing and retrieving large volumes of RDF data.