A novel recovery approach for cluster federations

Authors:
Bidyut Gupta;Shahram Rahimi;Raheel Ahmad;Raja Chirra
Affiliations:
Department of Computer Science, Southern Illinois University, Carbondale, IL;Department of Computer Science, Southern Illinois University, Carbondale, IL;Department of Computer Science, Southern Illinois University, Carbondale, IL;Department of Computer Science, Southern Illinois University, Carbondale, IL
Venue:
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Year:
2007

Citing 7
Cited 2

Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints

IEEE Transactions on Computers
Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability

IEEE Transactions on Parallel and Distributed Systems
Asynchronous recovery without using vector timestamps

Journal of Parallel and Distributed Computing
An efficient end-host architecture for cluster communication

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Hybrid checkpointing for parallel applications in cluster federations

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
A low-overhead non-block checkpointing algorithm for mobile computing environment

GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing

Novel Crash Recovery Approach for Concurrent Failures in Cluster Federation

GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
Domino-effect free crash recovery for concurrent failures in cluster federation

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we have addressed the complex problem of determining a recovery line for cluster federation and have proposed a fast recovery algorithm to handle failures in cluster federations. The main feature of the proposed algorithm is that it can be executed simultaneously by all clusters in the cluster federation. Besides, the number of trips to the stable storage necessary for executing the algorithm is much less compared to the same in some existing works. Also the proposed algorithm does not suffer from any message storm unlike some noted work in this area.