Efficient social network data query processing on MapReduce

Authors:
Liu Liu;Jiangtao Yin;Lixin Gao
Affiliations:
UMass Amherst, Amherst, MA, USA;UMass Amherst, Amherst, MA, USA;UMass Amherst, Amherst, MA, USA
Venue:
Proceedings of the 5th ACM workshop on HotPlanet
Year:
2013

Citing 10
Cited 0

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Web Semantics in the Clouds

IEEE Intelligent Systems
Benchmarking Fulltext Search Performance of RDF Stores

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
SPARQL basic graph pattern processing with iterative MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
YSmart: Yet Another SQL-to-MapReduce Translator

ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Scalable Multi-query Optimization for SPARQL

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
A distributed graph engine for web scale RDF data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social network data analysis becomes increasingly important for business intelligence and online social services. Lots of social network data is presented by Resource Description Framework (RDF). Accordingly, SPARQL, an RDF query language, becomes popular for social network data analysis. As the sizes of social networks expand rapidly, a SPARQL query usually involves a large quantity of data, and thus parallelizing its execution is desirable. MapReduce is a well-known and popular big data analysis tool. However, the state-of-the-art translation from SPARQL queries to MapReduce jobs is not efficient because it mainly follows a two layer rule which needs to transform the SPARQL triple pattern to the standard SQL join. In this paper, we propose two primitives to enable efficient translation from SPARQL queries to MapReduce jobs. We use multiple-join-with-filter to substitute traditional SQL multiple join when feasible, and merge different stages in the query workflow. The evaluation on social network data benchmarks shows that the translation based on these two primitives can achieve up to 2x speedup in query running time comparing to the traditional two layer scheme.