Nephele: efficient parallel data processing in the cloud

  • Authors:
  • Daniel Warneke;Odej Kao

  • Affiliations:
  • Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany

  • Venue:
  • Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In recent years Cloud Computing has emerged as a promising new approach for ad-hoc parallel data processing. Major cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used stem from the field of cluster computing and disregard the particular nature of a cloud. As a result, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our ongoing research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's compute clouds for both, task scheduling and execution. It allows assigning the particular tasks of a processing job to different types of virtual machines and takes care of their instantiation and termination during the job execution. Based on this new framework, we perform evaluations on a compute cloud system and compare the results to the existing data processing framework Hadoop.