Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids

  • Authors:
  • Michael Albrecht;Patrick Donnelly;Peter Bui;Douglas Thain

  • Affiliations:
  • University of Notre Dame;University of Notre Dame;University of Wisconsin - Eau Claire;University of Notre Dame

  • Venue:
  • Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.