The grid
Byzantine generals in action: implementing fail-stop processors
ACM Transactions on Computer Systems (TOCS)
A Scheduling Model for Grid Computing Systems
GRID '01 Proceedings of the Second International Workshop on Grid Computing
Fault Tolerant Wide-Area Parallel Computing
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Effective Metacomputing using LSF MultiCluster
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
A Fault Detection Service for Wide Area Distributed Computations
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Fault Tolerant Computing on the Grid: What are My Options?
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
A Monitoring Sensor Management System for Grid Environments
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Robust Resource Management for Metacomputers
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
GridWorkflow: A Flexible Failure Handling Framework for the Grid
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery
Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery
Fault-Tolerance in Coarse Grain Data Flow
Fault-Tolerance in Coarse Grain Data Flow
Fault-tolerant grid resource management infrastructure
Neural, Parallel & Scientific Computations - Special issue: Grid computing
Pro-active failure handling mechanisms for scheduling in grid computing environments
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Recent trends in grid computing development is moving towards a service-oriented architecture. With the momentum gaining for the service-oriented grid computing systems, the issue of deploying support for integrated scheduling and fault-tolerant approaches becomes paramount importance. To this end, we propose a scalable framework that loosely couples the dynamic job scheduling approach with the hybrid replications approach to schedule jobs efficiently while at the same time providing fault-tolerance. The novelty of the proposed framework is that it uses passive replication approach under high system load and active replication approach under low system loads. The switch between these two replication methods is also done dynamically and transparently.