SPARQL basic graph pattern processing with iterative MapReduce

Authors:
Jaeseok Myung;Jongheum Yeon;Sang-goo Lee
Affiliations:
Seoul National University, Seoul, Republic of Korea;Seoul National University, Seoul, Republic of Korea;Seoul National University, Seoul, Republic of Korea
Venue:
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Year:
2010

Citing 12
Cited 13

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Web Semantics in the Clouds

IEEE Intelligent Systems
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data

Proceedings of the 18th ACM conference on Information and knowledge management
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web

PigSPARQL: mapping SPARQL to Pig Latin

Proceedings of the International Workshop on Semantic Web Information Management
Matrix chain multiplication via multi-way join algorithms in MapReduce

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
RDFPath: path query processing on large RDF graphs with mapreduce

ESWC'11 Proceedings of the 8th international conference on The Semantic Web
RDF data management in the Amazon cloud

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Rya: a scalable RDF triple store for the clouds

Proceedings of the 1st International Workshop on Cloud Intelligence
Towards efficient join processing over large RDF graph using mapreduce

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
HadoopRDF: a scalable semantic data analytical engine

ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
An integrated multidimensional modeling approach to access big data in business intelligence platforms

ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Lightweight semantics over web information systems content employing knowledge tags

ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Scalable RDF graph querying using cloud computing

Journal of Web Engineering
Efficient social network data query processing on MapReduce

Proceedings of the 5th ACM workshop on HotPlanet
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

There have been a number of approaches to adopt the RDF data model and the MapReduce framework for a data warehouse, as the data model is suitable for data integration and the data processing framework is good for large-scale fault-tolerant data analyses. Nevertheless, most approaches consider the data model and the framework separately. It has been difficult to create synergy because there have been only a few algorithms which connects the data model and the framework. In this paper, we offer a general and efficient MapReduce algorithm for SPARQL Basic Graph Pattern which is a set of triple patterns to be joined. In a MapReduce world, it is known that the join operation requires computationally expensive MapReduce iterations. For this reason, we minimize the number of iterations with the followings. First, we adopt traditional multi-way join into MapReduce instead of multiple individual joins. Second, by analyzing a given query, we select a good join-key to avoid unnecessary iterations. As a result, the algorithm shows good performance and scalability in terms of time and data size.