High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

Authors:
Kurt Rohloff;Richard E. Schantz
Affiliations:
BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA
Venue:
Programming Support Innovations for Emerging Distributed Applications
Year:
2010

Citing 7
Cited 6

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Web Semantics in the Clouds

IEEE Intelligent Systems
The Quest for Parallel Reasoning on the Semantic Web

AMT '09 Proceedings of the 5th International Conference on Active Media Technology
Scalable Distributed Reasoning Using MapReduce

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Web 3.0: The Dawn of Semantic Search

Computer
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
An evaluation of triple-store technologies for large data stores

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II

Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Proceedings of the fourth international workshop on Data-intensive distributed computing
Efficient processing of RDF graph pattern matching on MapReduce platforms

Proceedings of the second international workshop on Data intensive computing in the clouds
Rya: a scalable RDF triple store for the clouds

Proceedings of the 1st International Workshop on Cloud Intelligence
Scalable SAPRQL querying processing on large RDF data in cloud computing environment

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
A distributed graph engine for web scale RDF data

Proceedings of the VLDB Endowment
Making the most of your triple store: query answering in OWL 2 using an RL reasoner

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we discuss the use of the MapReduce software framework to address the challenge of constructing high-performance, massively-scalable distributed systems. We discuss several design considerations associated with constructing complex distributed systems using the MapReduce software framework, including the difficulty of scalably building indexes. We focus on Hadoop, the most popular MapReduce implementation. Our discussion and analysis are motivated by our construction of SHARD, a massively scalable, high-performance and robust triple-store technology on top of Hadoop. We provide a general approach to construct an information system from the MapReduce software framework that responds to data queries. We provide experimental results generated of an early version of SHARD. We close with a discussion of hypothetical MapReduce alternatives that can be used for the construction of more scalable distributed computing systems.