Scaling queries over big RDF graphs with semantic hash partitioning

Authors:
Kisung Lee;Ling Liu
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 11
Cited 0

A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
An integrated experimental environment for distributed systems and networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Proceedings of the fourth international workshop on Data-intensive distributed computing
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

IEEE Transactions on Knowledge and Data Engineering
GraphChi: large-scale graph computation on just a PC

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Massive volumes of big RDF data are growing beyond the performance capacity of conventional RDF data management systems operating on a single node. Applications using large RDF data demand efficient data partitioning solutions for supporting RDF data access on a cluster of compute nodes. In this paper we present a novel semantic hash partitioning approach and implement a Semantic HAsh Partitioning-Enabled distributed RDF data management system, called Shape. This paper makes three original contributions. First, the semantic hash partitioning approach we propose extends the simple hash partitioning method through direction-based triple groups and direction-based triple replications. The latter enhances the former by controlled data replication through intelligent utilization of data access locality, such that queries over big RDF graphs can be processed with zero or very small amount of inter-machine communication cost. Second, we generate locality-optimized query execution plans that are more efficient than popular multi-node RDF data management systems by effectively minimizing the inter-machine communication cost for query processing. Third but not the least, we provide a suite of locality-aware optimization techniques to further reduce the partition size and cut down on the inter-machine communication cost during distributed query processing. Experimental results show that our system scales well and can process big RDF datasets more efficiently than existing approaches.