High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

  • Authors:
  • Kurt Rohloff;Richard E. Schantz

  • Affiliations:
  • BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA

  • Venue:
  • Programming Support Innovations for Emerging Distributed Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we discuss the use of the MapReduce software framework to address the challenge of constructing high-performance, massively-scalable distributed systems. We discuss several design considerations associated with constructing complex distributed systems using the MapReduce software framework, including the difficulty of scalably building indexes. We focus on Hadoop, the most popular MapReduce implementation. Our discussion and analysis are motivated by our construction of SHARD, a massively scalable, high-performance and robust triple-store technology on top of Hadoop. We provide a general approach to construct an information system from the MapReduce software framework that responds to data queries. We provide experimental results generated of an early version of SHARD. We close with a discussion of hypothetical MapReduce alternatives that can be used for the construction of more scalable distributed computing systems.