Scalable processing of flexible graph pattern queries on the cloud

Authors:
Padmashree Ravindra;Kemafor Anyanwu
Affiliations:
North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 4
Cited 0

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
To nest or not to nest, when and how much: representing intermediate results of graph pattern queries in MapReduce based processing

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Flexible exploration of large RDF datasets with unknown relationships can be enabled using 'unbound-property' graph pattern queries. Relational-style processing of such queries using normalized relations results in redundant information in intermediate results due to the repetition of adjoining bound (fixed) properties. Such redundancy negatively impacts the disk I/O, network transfer costs, and the required disk space while processing RDF query workloads on MapReduce-based systems. This work proposes packing and lazy unpacking strategies to minimize the redundancy in intermediate results while processing unbound-property queries. In addition to keeping the results compact, this work evaluates RDF queries using the Nested TripleGroup Data Model and Algebra (NTGA) that enables shorter MapReduce execution workflows. Experimental results demonstrate the benefit of this work over RDF query processing using relational-style systems such as Apache Pig and Hive.