Scalable RDF graph querying using cloud computing

Authors:
Ren Li;Dan Yang;Haibo Hu;Juan Xie;Li Fu
Affiliations:
College of Computer Science, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China;School of Software Engineering, Chongqing University, Chongqing, China
Venue:
Journal of Web Engineering
Year:
2013

Citing 23
Cited 0

Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Jena: implementing the semantic web recommendations

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
Web Semantics in the Clouds

IEEE Intelligent Systems
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
OWL 2: The next step for OWL

Web Semantics: Science, Services and Agents on the World Wide Web
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data

Proceedings of the 18th ACM conference on Information and knowledge management
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
SPARQL basic graph pattern processing with iterative MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
X-RIME: Cloud-Based Large Scale Social Network Analysis

SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
Semantic web reasoners and languages

Artificial Intelligence Review
Parallelizing join computations of SPARQL queries for large semantic web databases

Proceedings of the 2011 ACM Symposium on Applied Computing
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

IEEE Transactions on Knowledge and Data Engineering
Distributed Semantic Web Data Management in HBase and MySQL Cluster

CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Semantics and complexity of SPARQL

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
WebPIE: A Web-scale Parallel Inference Engine using MapReduce

Web Semantics: Science, Services and Agents on the World Wide Web
Efficiently joining group patterns in SPARQL queries

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
A MapReduce-based distributed SVM algorithm for automatic image annotation

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the explosion of the semantic web technologies, conventional SPARQL processing tools do not scale well for large amounts of RDF data because they are designed for use on a single-machine context. Several optimization solutions combined with cloud computing technologies have been proposed to overcome these drawbacks. However, these approaches only consider the SPARQL Basic Graph Pattern processing, and their file system-based schema can barely modify large-scale RDF data randomly. This paper presents a scalable SPARQL Group Graph Pattern (GGP) processing framework for large RDF graphs. We design a novel storage schema on HBase to store RDF data. Furthermore, a query plan generation algorithm is proposed to determine jobs based on a greedy selection strategy. Several query algorithms are also presented to answer SPARQL GGP queries in the MapReduce paradigm. An experiment on a simulation cloud computing environment shows that our framework is more scalable and efficient than traditional approaches when storing and retrieving large volumes of RDF data.