Efficient processing of RDF graph pattern matching on MapReduce platforms

Authors:
Padmashree Ravindra;Seokyong Hong;HyeongSik Kim;Kemafor Anyanwu
Affiliations:
North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA
Venue:
Proceedings of the second international workshop on Data intensive computing in the clouds
Year:
2011

Citing 14
Cited 1

Implementation of magic-sets in a relational database system

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
NCSU's Virtual Computing Lab: A Cloud Computing Solution

Computer
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools

CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

Programming Support Innovations for Emerging Distributed Applications
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II

Scalable SAPRQL querying processing on large RDF data in cloud computing environment

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World

Quantified Score

Hi-index	0.00

Visualization

Abstract

Broadened adoption of the Linking Open Data tenets has led to a significant surge in the amount of Semantic Web data, particularly RDF data. This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic Web research community. The RDF data model is a fine-grained model representing relationships as binary relations. Thus, answering queries (typically graph pattern matching queries) over RDF data requires several join operations to reassemble related data. While MapReduce based processing is emerging as the de facto paradigm for processing large scale data, it is known to be inefficient for join-intensive workloads. In addition, most of the existing techniques for optimizing RDF data processing do not transfer well to the MapReduce model and often require significant lead time for pre-processing. Such a requirement may not be desirable for on-demand cloud database scenarios where the goal is to reduce the Time-To-Result (TTR). In this position paper, we argue that some of these challenges can be overcome by rethinking the operators for graph pattern processing, as well as adopting dynamic optimization techniques that exploit information from the previous execution steps to eliminate intermediate results that are irrelevant in the context of future execution steps. We present some preliminary evaluation results of the proposed techniques.