A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

  • Authors:
  • Kittisak Sajjapongse;Xiang Wang;Michela Becchi

  • Affiliations:
  • University of Missouri, Columbia, MO, USA;University of Missouri, Columbia, MO, USA;University of Missouri, Columbia, MO, USA

  • Venue:
  • Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and Stampede and the Chinese Tianhe-1A supercomputers). As a result, widely used open-source cluster resource managers (e.g. SLURM and TORQUE) have recently been extended with GPU support capabilities. These systems, however, provide simple scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. In this paper, we propose a runtime system that can be integrated with existing cluster resource managers to enable a more efficient use of heterogeneous clusters with GPUs. Differently from previous work, we focus on multi-process GPU applications including synchronization (for example, hybrid MPI-CUDA applications). We discuss the limitations and inefficiencies of existing scheduling and resource sharing schemes in the presence of synchronization. We show that preemption is an effective mechanism to allow efficient scheduling of hybrid MPI-CUDA applications. We validate our runtime on a variety of benchmark programs with different computation and communication patterns.