RPC-V: Toward Fault-Tolerant RPC for Internet Connected Desktop Grids with Volatile Nodes

Authors:
Samir Djilali;Thomas Herault;Oleg Lodygensky;Tangui Morlier;Gilles Fedak;Franck Cappello
Affiliations:
Université de Paris Sud;Université de Paris Sud;Université de Paris Sud;Université de Paris Sud;Université de Paris Sud;Université de Paris Sud
Venue:
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Year:
2004

Citing 20
Cited 2

Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The remote computation system

Parallel Computing
A tight lower bound for randomized synchronous consensus

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Design and implementations of Ninf: towards a global computing infrastructure

Future Generation Computer Systems - Special issue on metacomputing
Deploying fault tolerance and taks migration with NetSolve

Future Generation Computer Systems - Special issue on metacomputing
The AppLeS parameter sweep template: user-level middleware for the grid

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Freenet: a distributed anonymous information storage and retrieval system

International workshop on Designing privacy enhancing technologies: design issues in anonymity and unobservability
Evaluating the running time of a communication round over the internet

Proceedings of the twenty-first annual symposium on Principles of distributed computing
A fault detection service for wide area distributed computations

Cluster Computing
Applying NetSolve's Network-Enabled Server

IEEE Computational Science & Engineering
The NEOS Server

IEEE Computational Science & Engineering
Overview of GridRPC: A Remote Procedure Call API for Grid Computing

GRID '02 Proceedings of the Third International Workshop on Grid Computing
The power of epidemics: robust communication for large-scale distributed systems

ACM SIGCOMM Computer Communication Review
XtremWeb: A Generic Global Computing System

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
P2P-RPC: Programming Scientific Applications on Peer-to-Peer Systems with Remote Procedure Call

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Asynchronous Active Replication in Three-Tier Distributed Systems

PRDC '02 Proceedings of the 2002 Pacific Rim International Symposium on Dependable Computing
Three-tier replication for FT-CORBA infrastructures

Software—Practice & Experience
Adaptive Scheduling for Task Farming with Grid Middleware

International Journal of High Performance Computing Applications
End-to-end WAN service availability

USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3

Diet: new developments and recent results

Euro-Par'06 Proceedings of the CoreGRID 2006, UNICORE Summit 2006, Petascale Computational Biology and Bioinformatics conference on Parallel processing
A taxonomy of peer-to-peer desktop grid paradigms

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

RPC is one of the programming models envisioned for the Grid. In Internet connected Large Scale Grids such as Desktop Grids, nodes and networks failures are not rare events. This paper provides several contributions, examining the feasibility and limits of fault-tolerant RPC on these platforms. First, we characterize these Grids from their fundamental features and demonstrate that their applications scope should be safely restricted to stateless services. Second, we present a new fault-tolerant RPC protocol associating an original combination of three-tier architecture, passive replication and message logging. We describe RPC-V, an implementation of the proposed protocol within the XtremWeb Desktop Grid middleware. Third, we evaluate the performance of RPC-V and the impact of faults on the execution time, using a real life application on a Desktop Grid testbed assembling nodes in France and USA. We demonstrate that RPC-V allows the applications to continue their execution while key system components fail.