Scalable SAPRQL querying processing on large RDF data in cloud computing environment

Authors:
Buwen Wu;Hai Jin;Pingpeng Yuan
Affiliations:
Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Venue:
ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Year:
2012

Citing 19
Cited 0

Integrating vertical and horizontal partitioning into automated physical database design

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Distribution Design of Logical Database Schemas

IEEE Transactions on Software Engineering
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SW-Store: a vertically partitioned DBMS for Semantic Web data management

The VLDB Journal — The International Journal on Very Large Data Bases
Sideways Information Passing for Push-Style Query Processing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web

ISWC '09 Proceedings of the 8th International Semantic Web Conference
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
SPARQL query optimization on top of DHTs

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

Programming Support Innovations for Emerging Distributed Applications
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

IEEE Transactions on Knowledge and Data Engineering
Efficient processing of RDF graph pattern matching on MapReduce platforms

Proceedings of the second international workshop on Data intensive computing in the clouds

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently the flexibility of RDF data model makes increasing number of organizations and communities keep their data available in the RDF format. There is a growing need for querying these data in scalable and efficient way. MapReduce is a parallel data processing solution for processing large data-intensive workloads, which is not supported directly for join-intensive workloads. In this paper, we present a schema based hybrid partitioning technique for RDF triples placement according to the relationships between them, and reduce the necessary number of MR cycles in each SAPRQL query job. Then we propose a lightweight sideways information passing techniques which pass the join information across MR jobs to decrease the intermediate results involved in join operations. The experimental results show that our approaches achieve a substantial performance improvement, and outperform the previous system by a factor of 2-20 using LUBM benchmark.