The Resource Usage Aware Backfilling

  • Authors:
  • Francesc Guim;Ivan Rodero;Julita Corbalan

  • Affiliations:
  • Computer Architecture Department, Technical University of Catalonia (UPC), Spain;Computer Architecture Department, Technical University of Catalonia (UPC), Spain;Computer Architecture Department, Technical University of Catalonia (UPC), Spain

  • Venue:
  • Job Scheduling Strategies for Parallel Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Job scheduling policies for HPC centers have been extensively studied in the last few years, especially backfilling based policies. Almost all of these studies have been done using simulation tools. All the existent simulators use the runtime (either estimated or real) provided in the workload as a basis of their simulations. In our previous work we analyzed the impact on system performance of considering the resource sharing (memory bandwidth) of running jobs including a new resource model in the Alvio simulator. Based on this studies we proposed the LessConsume and LessConsume Threshold resource selection policies. Both are oriented to reduce the saturation of the shared resources thus increasing the performance of the system. The results showed how both resource allocation policies shown how the performance of the system can be improved by considering where the jobs are finally allocated.Using the LessConsume Threshold Resource Selection Policy, we propose a new backfilling strategy : the Resource Usage Aware Backfilling job scheduling policy. This is a backfilling based scheduling policy where the algorithms which decide which job has to be executed and how jobs have to be backfilled are based on a different Threshold configurations. This backfilling variant that considers how the shared resources are used by the scheduled jobs. Rather than backfilling the first job that can moved to the run queue based on the job arrival time or job size, it looks ahead to the next queued jobs, and tries to allocate jobs that would experience lower penalized runtime caused by the resource sharing saturation.In the paper we demostrate how the exchange of scheduling information between the local resource manager and the scheduler can improve substantially the performance of the system when the resource sharing is considered. We show how it can achieve a close response time performance that the shorest job first Backfilling with First Fit (oriented to improve the start time for the allocated jobs) providing a qualitative improvement in the number of killed jobs and in the percentage of penalized runtime.