Enabling large-scale scientific workflows on petascale resources using MPI master/worker

  • Authors:
  • Mats Rynge;Scott Callaghan;Ewa Deelman;Gideon Juve;Gaurang Mehta;Karan Vahi;Philip J. Maechling

  • Affiliations:
  • University of Southern California;University of Southern California;University of Southern California;University of Southern California;University of Southern California;University of Southern California;University of Southern California

  • Venue:
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computational scientists often need to execute large, loosely-coupled parallel applications such as workflows and bags of tasks in order to do their research. These applications are typically composed of many, short-running, serial tasks, which frequently demand large amounts of computation and storage. In order to produce results in a reasonable amount of time, scientists would like to execute these applications using petascale resources. In the past this has been a challenge because petascale systems are not designed to execute such workloads efficiently. In this paper we describe a new approach to executing large, fine-grained workflows on distributed petascale systems. Our solution involves partitioning the workflow into independent subgraphs, and then submitting each subgraph as a self-contained MPI job to the available resources (often remote). We describe how the partitioning and job management has been implemented in the Pegasus Workflow Management System. We also explain how this approach provides an end-to-end solution for challenges related to system architecture, queue policies and priorities, and application reuse and development. Finally, we describe how the system is being used to enable the execution of a very large seismic hazard analysis application on XSEDE resources.