Workflow task clustering for best effort systems with Pegasus

  • Authors:
  • Gurmeet Singh;Mei-Hui Su;Karan Vahi;Ewa Deelman;Bruce Berriman;John Good;Daniel S. Katz;Gaurang Mehta

  • Affiliations:
  • USC Information Sciences Institute, Marina Del Rey, CA;USC Information Sciences Institute, Marina Del Rey, CA;USC Information Sciences Institute, Marina Del Rey, CA;USC Information Sciences Institute, Marina Del Rey, CA;California Institute of Technology, Pasadena, CA;California Institute of Technology, Pasadena, CA;Louisiana State University, Baton Rouge, LA;USC Information Sciences Institute, Marina Del Rey, CA

  • Venue:
  • Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ task clustering techniques to increase the computational granularity of workflow tasks. The goal is to minimize the completion time of the workflow by reducing the impact of queue wait times. In this paper, we examine the performance impact of the clustering techniques using the Pegasus workflow management system. Experiments performed using an astronomy workflow on the NCSA TeraGrid cluster show that clustering can achieve a significant reduction in the workflow completion time (up to 97%).