Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus

  • Authors:
  • Gabrielle Allen;Thomas Dramlitsch;Ian Foster;Nicholas T. Karonis;Matei Ripeanu;Edward Seidel;Brian Toonen

  • Affiliations:
  • Max Planck Institute for Gravitational Physics, Golm, Germany;Max Planck Institute for Gravitational Physics, Golm, Germany;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL and The University of Chicago, Chicago, IL;Northern Illinois University, DeKalb, IL and Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL;The University of Chicago, Chicago, IL;Max Planck Institute for Gravitational Physics, Golm, Germany;Argonne National Laboratory, Argonne, IL

  • Venue:
  • Proceedings of the 2001 ACM/IEEE conference on Supercomputing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development difficult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support efficient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU's) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.