DGSS: A Dependability Guided Job Scheduling System for Grid Environment

  • Authors:
  • Yongcai Tao;Hai Jin;Xuanhua Shi

  • Affiliations:
  • Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China

  • Venue:
  • ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the diverse failures and error conditions in grid environments, node unavailability is increasingly becoming severe and poses great challenges to reliable job scheduling in grid environment. Current job management systems mainly exploit fault recovery mechanism to guarantee the completion of jobs, but sacrificing system efficiency. To address the challenges, in this paper, a node TTF (Time To Failure) prediction model and job completion prediction model are designed. Based on these models, the paper proposes a dependability guided job scheduling system, called DGSS, which provides failure avoidance job scheduling. The experimental results validate the improvement in the dependability of job execution and system resources utilization.