Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Asynchronous recovery without using vector timestamps
Journal of Parallel and Distributed Computing
An efficient end-host architecture for cluster communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Hybrid checkpointing for parallel applications in cluster federations
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
A low-overhead non-block checkpointing algorithm for mobile computing environment
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Novel Crash Recovery Approach for Concurrent Failures in Cluster Federation
GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
Domino-effect free crash recovery for concurrent failures in cluster federation
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Hi-index | 0.00 |
In this paper, we have addressed the complex problem of determining a recovery line for cluster federation and have proposed a fast recovery algorithm to handle failures in cluster federations. The main feature of the proposed algorithm is that it can be executed simultaneously by all clusters in the cluster federation. Besides, the number of trips to the stable storage necessary for executing the algorithm is much less compared to the same in some existing works. Also the proposed algorithm does not suffer from any message storm unlike some noted work in this area.