Sun Grid Engine: Towards Creating a Compute Power Grid
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
AzureBlast: a case study of developing science applications on the cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Design and Implementation of GXP Make -- A Workflow System Based on Make
ESCIENCE '10 Proceedings of the 2010 IEEE Sixth International Conference on e-Science
Biocompute 2.0: an improved collaborative workspace for data intensive bio-science
Concurrency and Computation: Practice & Experience
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Toward fine-grained online task characteristics estimation in scientific workflows
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Automated packaging of bioinformatics workflows for portability and durability using makeflow
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Design of an active storage cluster file system for DAG workflows
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hi-index | 0.00 |
In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.