Incorporating Fault Tolerance with Replication on Very Large Scale Grids

Authors:
Elankovan Sundararajan;Aaron Harwood;Ramamohanarao Kotagiri
Affiliations:
-;-;-
Venue:
PDCAT '07 Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies
Year:
2007

Citing 0
Cited 2

Event Based Simulator for Parallel Computing over the Wide Area Network for Real Time Visualization

IVIC '09 Proceedings of the 1st International Visual Informatics Conference on Visual Informatics: Bridging Research and Practice
Performance analysis of replication mechanism using mobile agent in computational grid using WADE

International Journal of Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Providing fault tolerance for message passing parallel application on a distributed environment is a rule rather than an exception. A node failure can cause the whole computation to stop and has to be restarted from the begin- ning if no fault tolerance is available. However, introducing fault tolerance has some overhead on speedup that can be achieved. In this paper, we introduce a new technique called replication with cross-over packets for reliability and to in- crease fault tolerance over Very Large Scale Grids (VLSG). This technique has two pronged effect of avoiding single point of failure and single link of failure. We incorporate this new technique into the L-BSP model and show the pos- sible speedup of parallel process. We also derive the achiev- able speedup for some fundamental parallel algorithms us- ing this technique.