Efficient scheduling of scientific workflows in a high performance computing cluster

Authors:
Roger S. Barga;Dan Fay;Dean Guo;Steven Newhouse;Yogesh Simmhan;Alex Szalay
Affiliations:
Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;The Johns Hopkins University, Baltimore, MD, USA
Venue:
CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
Year:
2008

Citing 1
Cited 4

Petascale Computational Systems

Computer

A distributed architecture for data mining and integration

Proceedings of the second international workshop on Data-aware distributed computing
Exploring many task computing in scientific workflows

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Optimizing resource allocation for scientific workflows using advance reservations

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Scheduling of frequently communicating tasks

International Journal of Communication Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100+ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exemplar cluster for data intensive applications using SQL Server and HPC Clusters. One of the key software components of GrayWulf is Trident, a scientific workflow workbench that performs automatic scheduling of workflows across the cluster. We examine the challenges of scheduling workflows on GrayWulf, algorithms to improve performance, and present early results from applying Trident to schedule data loading workflows on GrayWulf for an actual e-Science project