Robust parallel job scheduling infrastructure for service-oriented grid computing systems

Authors:
J. H. Abawajy
Affiliations:
School of Information Technology, Deakin University, Geelong, VIC, Australia
Venue:
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
Year:
2005

Citing 14
Cited 1

The Globus toolkit

The grid
Byzantine generals in action: implementing fail-stop processors

ACM Transactions on Computer Systems (TOCS)
A Scheduling Model for Grid Computing Systems

GRID '01 Proceedings of the Second International Workshop on Grid Computing
Fault Tolerant Wide-Area Parallel Computing

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Effective Metacomputing using LSF MultiCluster

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Fault Tolerant Computing on the Grid: What are My Options?

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
A Monitoring Sensor Management System for Grid Environments

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Robust Resource Management for Metacomputers

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
GridWorkflow: A Flexible Failure Handling Framework for the Grid

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery

Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery
Fault-Tolerance in Coarse Grain Data Flow

Fault-Tolerance in Coarse Grain Data Flow
Fault-tolerant grid resource management infrastructure

Neural, Parallel & Scientific Computations - Special issue: Grid computing

Pro-active failure handling mechanisms for scheduling in grid computing environments

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent trends in grid computing development is moving towards a service-oriented architecture. With the momentum gaining for the service-oriented grid computing systems, the issue of deploying support for integrated scheduling and fault-tolerant approaches becomes paramount importance. To this end, we propose a scalable framework that loosely couples the dynamic job scheduling approach with the hybrid replications approach to schedule jobs efficiently while at the same time providing fault-tolerance. The novelty of the proposed framework is that it uses passive replication approach under high system load and active replication approach under low system loads. The switch between these two replication methods is also done dynamically and transparently.