A performance study of job management systems: Research Articles

  • Authors:
  • Tarek El-Ghazawi;Kris Gaj;Nikitas Alexandridis;Frederic Vroman;Nguyen Nguyen;Jacek R. Radzikowski;Preeyapong Samipagdi;Suboh A. Suboh

  • Affiliations:
  • ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A.;ECE Department, George Mason University, 4400 University Drive, Fairfax, VA 22030, U.S.A.;ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A.;ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A.;ECE Department, George Mason University, 4400 University Drive, Fairfax, VA 22030, U.S.A.;ECE Department, George Mason University, 4400 University Drive, Fairfax, VA 22030, U.S.A.;ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A.;ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A.

  • Venue:
  • Concurrency and Computation: Practice & Experience - Systems Performance Evaluation
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Job Management Systems (JMSs) efficiently schedule and monitor jobs in parallel and distributed computing environments. Therefore, they are critical for improving the utilization of expensive resources in high-performance computing systems and centers, and an important component of Grid software infrastructure. With many JMSs available commercially and in the public domain, it is difficult to choose an optimum JMS for a given computing environment. In this paper, we present the results of the first empirical study of JMSs reported in the literature. Four commonly used systems, LSF, PBS Pro, Sun Grid Engine/CODINE, and Condor were considered. The study has revealed important strengths and weaknesses of these JMSs under different operational conditions. For example, LSF was shown to exhibit excellent throughput for a wide range of job types and submission rates. Alternatively, CODINE appeared to outperform other systems in terms of the average turn-around time for small jobs, and PBS appeared to excel in terms of turn-around time for relatively larger jobs. Copyright © 2004 John Wiley & Sons, Ltd.