DGSS: A Dependability Guided Job Scheduling System for Grid Environment

Authors:
Yongcai Tao;Hai Jin;Xuanhua Shi
Affiliations:
Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China;Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Venue:
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Year:
2007

Citing 8
Cited 0

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Improving cluster availability using workstation validation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A taxonomy and survey of grid resource management systems for distributed computing

Software—Practice & Experience
GridFlow: Workflow Management for Grid Computing

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Failure Data Analysis of a Large-Scale Heterogeneous Server Environment

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
A grid service broker for scheduling distributed data-oriented applications on global grids

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An adaptive meta-scheduler for data-intensive applications

International Journal of Grid and Utility Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the diverse failures and error conditions in grid environments, node unavailability is increasingly becoming severe and poses great challenges to reliable job scheduling in grid environment. Current job management systems mainly exploit fault recovery mechanism to guarantee the completion of jobs, but sacrificing system efficiency. To address the challenges, in this paper, a node TTF (Time To Failure) prediction model and job completion prediction model are designed. Based on these models, the paper proposes a dependability guided job scheduling system, called DGSS, which provides failure avoidance job scheduling. The experimental results validate the improvement in the dependability of job execution and system resources utilization.