The Penn State computing condominium scheduling system

  • Authors:
  • Pawan Agnihotri;Vijay K. Agarwala;Jeffrey J. Nucciarone;Kevin M. Morooney;Chita Das

  • Affiliations:
  • Penn State University, University Park, PA;Penn State University, University Park, PA;Penn State University, University Park, PA;Penn State University, University Park, PA;Penn State University, University Park, PA

  • Venue:
  • SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Penn State RS/6000 SP is a uniquely acquired and operated computing facility. This 143 CPU machine, centrally located and jointly owned, is a result of collaboration between academic departments, research groups, and the central academic computing facility. It is the largest on-campus resource at Penn State for meeting the high performance computing needs.Due to the joint ownership structure of the machine, the job scheduling requirements are significantly different from the usual methods of job processor allocation in distributed memory parallel machines. After several years of adapting different queuing systems, primarily the Distributed Queuing System, to our needs, it became obvious that the conventional scheduling systems did not serve the machine scheduling requirements unique to the Penn State SP. We concluded that a robust and easily configurable system needs to be developed to meet our unique needs. We have drawn inspiration from and modeled our system on EASY. As with EASY, we use the application programming interface of LoadLeveler to implement our scheduler. Our scheduler is named Penn State Condominium Scheduler (PSCS). PSCS does policy implementation and job execution on the machine is done by LoadLeveler.PSCS is written to facilitate easier configuration and administration. It does not have any processor architecture dependence. It is similar to the native scheduler in LoadLeveler in this regard. PSCS has incorporated three unique features: (i) node owner affinity which ensures fairness by allocation based on ownership, (ii) backfilling which ensures efficient utilization of resources, and (iii) affinity for services provided which ensures proper matching of jobs to the processors based on memory, software and other requirements. Jobs from users who own nodes in the SP complex have affinity to those particular processors owned by them. They also have preferences granted to them depending on their ownership level. Once the demand from the node owners is met, the next important goal is to keep the machine as fully occupied with running jobs as possible. This is accomplished by backfilling. This scheduler incorporates these features which are most important to successful implementation of multi-owner, centrally located, heterogeneous computing facilities.