Efficient scheduling of scientific workflows in a high performance computing cluster

  • Authors:
  • Roger S. Barga;Dan Fay;Dean Guo;Steven Newhouse;Yogesh Simmhan;Alex Szalay

  • Affiliations:
  • Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;The Johns Hopkins University, Baltimore, MD, USA

  • Venue:
  • CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100+ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exemplar cluster for data intensive applications using SQL Server and HPC Clusters. One of the key software components of GrayWulf is Trident, a scientific workflow workbench that performs automatic scheduling of workflows across the cluster. We examine the challenges of scheduling workflows on GrayWulf, algorithms to improve performance, and present early results from applying Trident to schedule data loading workflows on GrayWulf for an actual e-Science project