An Evaluation of Communication Factors on an Adaptive Control Strategy for Job Co-allocation in Multiple HPC Clusters

  • Authors:
  • Jinhui Qin;Michael A. Bauer

  • Affiliations:
  • -;-

  • Venue:
  • ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters, i.e., job co-allocation, offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processes larger than processors on any single cluster. Effective co-allocation, ultimately, depends on the inter-cluster communication cost. We previously introduced a scalable co-allocation strategy – Maximum Bandwidth Adjacent cluster Set (MBAS) strategy. It made use of two thresholds to control job co-allocation – one dealing with inter-cluster links and one controlling job partitioning. We subsequently introduced the Adaptive Threshold Control System (ATCS), which used a fuzzy control approach to dynamically adjust these thresholds within MBAS. Results suggested that using ATCS during MBAS job co-allocation could achieve an overall performance improvement. However, these results only considered jobs that involved either master-slave or all-all communications among constituent processes. In this paper, we extend this analysis by also considering jobs that exhibit 2D-mesh communication patterns and evaluate ATCS further.