A Study on Job Co-Allocation in Multiple HPC Clusters

  • Authors:
  • Jinhui Qin;Michael Bauer

  • Affiliations:
  • University of Western Ontario, Canada;University of Western Ontario, Canada

  • Venue:
  • HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

To more effectively use HPC clusters for even larger computations, improve turn-around times and better utilize compute resource, users are looking to interconnect multiple HPC clusters, creating a grid. To effectively use such grids, it may be desirable to split and co-allocate jobs requiring many processes across multiple clusters. While splitting a very large job across multiple clusters is an attractive possibility, the benefit, in terms of improving turn-around time, ultimately depends on the communication patterns between processes, workload on the communication links, and the maximum bandwidth of the links. The objective of this work is to understand the impact of communications on multi-processor jobs in order to develop scheduling strategies and job allocation algorithms for multi-cluster grids which can accommodate communication factors. In this paper we report on initial investigations of some co-allocation strategies. This evaluation is based on a simulator that has been implemented and validated experimentally across two HPC clusters.