Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids

Authors:
Michael Albrecht;Patrick Donnelly;Peter Bui;Douglas Thain
Affiliations:
University of Notre Dame;University of Notre Dame;University of Wisconsin - Eau Claire;University of Notre Dame
Venue:
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Year:
2012

Citing 14
Cited 4

Sun Grid Engine: Towards Creating a Compute Power Grid

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Transparent result caching

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions

Cluster Computing
AzureBlast: a case study of developing science applications on the cloud

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Design and Implementation of GXP Make -- A Workflow System Based on Make

ESCIENCE '10 Proceedings of the 2010 IEEE Sixth International Conference on e-Science
Biocompute 2.0: an improved collaborative workspace for data intensive bio-science

Concurrency and Computation: Practice & Experience

MTC envelope: defining the capability of large scale computers in the context of parallel scripting applications

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Toward fine-grained online task characteristics estimation in scientific workflows

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Automated packaging of bioinformatics workflows for portability and durability using makeflow

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Design of an active storage cluster file system for DAG workflows

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.