Planning spatial workflows to optimize grid performance

  • Authors:
  • Luiz Meyer;James Annis;Mike Wilde;Marta Mattoso;Ian Foster

  • Affiliations:
  • Federal University of Rio de Janeiro;Fermilab, Experimental Astrophysics;Argonne National Laboratory;Federal University of Rio de Janeiro;Argonne National Laboratory

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many scientific workflows, particularly those that operate on spatially oriented data, jobs that process adjacent regions of space often reference large numbers of files in common. Such workflows, when processed using workflow planning algorithms that are unaware of the application's file reference pattern, result in a huge number of redundant file transfers between grid sites and consequently perform poorly. This work presents a generalized approach to planning spatial workflow schedules for Grid execution based on the spatial proximity of files and the spatial range of jobs. We evaluate our solution to this problem using the file access pattern of an astronomy application that performs co-addition of images from the Sloan Digital Sky Survey. We show that, in initial tests on Grids of 5 to 25 sites, our spatial clustering approach eliminates 50% to 90% of the file transfers between Grid sites relative to the next-best planning algorithms we tested that were not "spatially aware". At moderate levels of concurrent file transfer, this reduction of redundant network I/O improves the application execution time by 30% to 70%, reduces Grid network and storage overhead and is broadly applicable to a wide range of spatially-oriented problems.