Towards automated HPC scheduler configuration tuning

  • Authors:
  • Diwakar Krishnamurthy;Mehrnoush Alemzadeh;Mahmood Moussavi

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Calgary, 2500 University Drive NW, Calgary, AB, Canada T2N 1N4;Department of Electrical and Computer Engineering, University of Calgary, 2500 University Drive NW, Calgary, AB, Canada T2N 1N4;Department of Electrical and Computer Engineering, University of Calgary, 2500 University Drive NW, Calgary, AB, Canada T2N 1N4

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

High performance computing (HPC) systems allow researchers and businesses to harness large amounts of computing power needed for solving complex problems. In such systems a job scheduler prioritizes the execution of jobs belonging to users of the system in a manner that allows the system to satisfy performance objectives for various groups of users while simultaneously making efficient use of available resources. Typically, system administrators have the responsibility of manually configuring or tuning the job scheduler such that the performance objectives of user groups as well as system-level performance objectives are met. Modern job schedulers used in production systems are quite complex. Through detailed trace-driven simulations, we show that manually tuning the configuration of production schedulers in an environment characterized by multiple performance objectives is very challenging and may not be feasible. To alleviate this problem, this paper describes a toolset that can help a system administrator to automatically configure a scheduler such that the performance objectives for various classes of users in the system as well as other system-level performance objectives can be satisfied. A unique aspect of this work that differentiates it from the existing work on scheduler tuning is that it has been implemented to work with a widely used production scheduler. Furthermore, in contrast to the existing work it considers the challenging real-world problem of delivering different levels of performance to different classes of users. System administrators can exploit the toolset to react quickly to changes in performance objectives and workload conditions. Case studies using synthetic and real HPC workloads demonstrate the effectiveness of the technique. Copyright © 2011 John Wiley & Sons, Ltd.