Improving Gang Scheduling through job performance analysis and malleability

  • Authors:
  • Julita Corbalan;Xavier Martorell;Jesus Labarta

  • Affiliations:
  • Universitat Polit`cnica de Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain;Universitat Polit`cnica de Catalunya (UPC), c/ Jordi Girona 1-3, 08034, Barcelona, Spain;Universitat Polit`cnica de Catalunya (UPC), c/ Jordi Girona 1-3, 08034, Barcelona, Spain

  • Venue:
  • ICS '01 Proceedings of the 15th international conference on Supercomputing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The OpenMP programming model provides parallel applications a very important feature: job malleability. Job malleability is the capacity of an application to dynamically adapt its parallelism to the number of processors allocated to it. We believe that job malleability provides to applications the flexibility that a system needs to achieve its maximum performance. We also defend that a system has to take its decisions not only based on user requirements but also based on run-time performance measurements to ensure the efficient use of resources. Job malleability is the application characteristic that makes possible the run-time performance analysis. Without malleability applications would not be able to adapt their parallelism to the system decisions. To support these ideas, we present two new approaches to attack the two main problems of Gang Scheduling: the excessive number of time slots and the fragmentation. Our first proposal is to apply a scheduling policy inside each time slot of Gang Scheduling to distribute processors among applications considering their efficiency, calculated based on run-time measurements. We call this policy Performance-Driven Gang Scheduling. Our second approach is a new re-packing algorithm, Compress&Join, that exploits the job malleability. This algorithm modifies the processor allocation of running applications to adapt it to the system necessities and minimize the fragmentation and number of time slots. These proposals have been implemented in a SGI Origin 2000 with 64 processors. Results show the validity and convenience of both, to consider the job performance analysis calculated at run-time to decide the processor allocation, and to use a flexible programming model that adapts applications to system decisions.