An Enabling Framework for Master-Worker Applications on the Computational Grid
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Condor-G: A Computation Management Agent for Multi-Institutional Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Compact DAG representation and its symbolic scheduling
Journal of Parallel and Distributed Computing
ASKALON: a tool set for cluster and Grid computing: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Workflow task clustering for best effort systems with Pegasus
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
The Pilot Way to Grid Resources Using glideinWMS
CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 02
The Globus Replica Location Service: Design and Experience
IEEE Transactions on Parallel and Distributed Systems
Advance reservation policies for workflows
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Experiences with resource provisioning for scientific workflows using Corral
Scientific Programming
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Metrics for heterogeneous scientific workflows: A case study of an earthquake science application
International Journal of High Performance Computing Applications
DAGuE: A generic distributed DAG engine for High Performance Computing
Parallel Computing
Experiences Using GlideinWMS and the Corral Frontend across Cyberinfrastructures
ESCIENCE '11 Proceedings of the 2011 IEEE Seventh International Conference on eScience
Efficient programming paradigm for video streaming processing on TILE64 platform
The Journal of Supercomputing
Hi-index | 0.00 |
Computational scientists often need to execute large, loosely-coupled parallel applications such as workflows and bags of tasks in order to do their research. These applications are typically composed of many, short-running, serial tasks, which frequently demand large amounts of computation and storage. In order to produce results in a reasonable amount of time, scientists would like to execute these applications using petascale resources. In the past this has been a challenge because petascale systems are not designed to execute such workloads efficiently. In this paper we describe a new approach to executing large, fine-grained workflows on distributed petascale systems. Our solution involves partitioning the workflow into independent subgraphs, and then submitting each subgraph as a self-contained MPI job to the available resources (often remote). We describe how the partitioning and job management has been implemented in the Pegasus Workflow Management System. We also explain how this approach provides an end-to-end solution for challenges related to system architecture, queue policies and priorities, and application reuse and development. Finally, we describe how the system is being used to enable the execution of a very large seismic hazard analysis application on XSEDE resources.