MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
IEEE Intelligent Systems
Benchmarking Fulltext Search Performance of RDF Stores
ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
LUBM: A benchmark for OWL knowledge base systems
Web Semantics: Science, Services and Agents on the World Wide Web
SPARQL basic graph pattern processing with iterative MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
YSmart: Yet Another SQL-to-MapReduce Translator
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Scalable Multi-query Optimization for SPARQL
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
A distributed graph engine for web scale RDF data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Social network data analysis becomes increasingly important for business intelligence and online social services. Lots of social network data is presented by Resource Description Framework (RDF). Accordingly, SPARQL, an RDF query language, becomes popular for social network data analysis. As the sizes of social networks expand rapidly, a SPARQL query usually involves a large quantity of data, and thus parallelizing its execution is desirable. MapReduce is a well-known and popular big data analysis tool. However, the state-of-the-art translation from SPARQL queries to MapReduce jobs is not efficient because it mainly follows a two layer rule which needs to transform the SPARQL triple pattern to the standard SQL join. In this paper, we propose two primitives to enable efficient translation from SPARQL queries to MapReduce jobs. We use multiple-join-with-filter to substitute traditional SQL multiple join when feasible, and merge different stages in the query workflow. The evaluation on social network data benchmarks shows that the translation based on these two primitives can achieve up to 2x speedup in query running time comparing to the traditional two layer scheme.