Oivos: Simple and Efficient Distributed Data Processing

  • Authors:
  • Steffen Viken Valvåg;Dag Johansen

  • Affiliations:
  • -;-

  • Venue:
  • HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The complexity of implementing large scale distributed computations has motivated new programming models. Google's MapReduce model has gained widespread use and aims to hide the complex details of data partitioning and distribution, scheduling, synchronization, and fault tolerance. However, our experiences from the enterprise search business indicate that many real-life applications must be implemented as a collection of related MapReduce programs. Since the execution of these programs must be monitored and coordinated externally, several issues concerning scheduling, synchronization, and fault tolerance resurface. To address these limitations, we introduce Oivos; a high-level declarative programming model and its underlying runtime. We show how Oivos programs may specify computations that span multiple heterogeneous and interdependent data sets, how the programs are compiled and optimized, and how our run-time orchestrates and monitors their distributed execution. Our experimental evaluation reveals that Oivos programs do less I/O and execute significantly faster than the equivalent sequences of MapReduce passes.