A grid-enabled MPI: message passing in heterogeneous distributed computing systems
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Proving Properties of PVM Applications - A Case Study with CoCheck
EuroPVM '96 Proceedings of the Third European PVM Conference on Parallel Virtual Machine
The Globus Project: A Status Report
HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Checkpointing Message-Passing Interface(MPI) Parallel Programs
PRFTS '97 Proceedings of the 1997 Pacific Rim International Symposium on Fault-Tolerant Systems
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
The computational grid, as it is today, is vulnerable to node failures and the probability of a node failure rapidly grows as the size of the grid increases. There have been several attempts to provide fault tolerance using checkpointing and message logging in conjunction with the MPI library. However, the Grid itself should be active in dealing with the failures. We propose a dynamic reconfigurable architecture where the applications can regroup in the face of a failure. The proposed architecture removes the single point of failure from the computational grids and provides flexibility in terms of grid configuration.