Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Parallel Computing
A tight lower bound for randomized synchronous consensus
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Design and implementations of Ninf: towards a global computing infrastructure
Future Generation Computer Systems - Special issue on metacomputing
Deploying fault tolerance and taks migration with NetSolve
Future Generation Computer Systems - Special issue on metacomputing
The AppLeS parameter sweep template: user-level middleware for the grid
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Freenet: a distributed anonymous information storage and retrieval system
International workshop on Designing privacy enhancing technologies: design issues in anonymity and unobservability
Evaluating the running time of a communication round over the internet
Proceedings of the twenty-first annual symposium on Principles of distributed computing
A fault detection service for wide area distributed computations
Cluster Computing
Applying NetSolve's Network-Enabled Server
IEEE Computational Science & Engineering
IEEE Computational Science & Engineering
Overview of GridRPC: A Remote Procedure Call API for Grid Computing
GRID '02 Proceedings of the Third International Workshop on Grid Computing
The power of epidemics: robust communication for large-scale distributed systems
ACM SIGCOMM Computer Communication Review
XtremWeb: A Generic Global Computing System
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
P2P-RPC: Programming Scientific Applications on Peer-to-Peer Systems with Remote Procedure Call
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Asynchronous Active Replication in Three-Tier Distributed Systems
PRDC '02 Proceedings of the 2002 Pacific Rim International Symposium on Dependable Computing
Three-tier replication for FT-CORBA infrastructures
Software—Practice & Experience
Adaptive Scheduling for Task Farming with Grid Middleware
International Journal of High Performance Computing Applications
End-to-end WAN service availability
USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3
Diet: new developments and recent results
Euro-Par'06 Proceedings of the CoreGRID 2006, UNICORE Summit 2006, Petascale Computational Biology and Bioinformatics conference on Parallel processing
A taxonomy of peer-to-peer desktop grid paradigms
Cluster Computing
Hi-index | 0.00 |
RPC is one of the programming models envisioned for the Grid. In Internet connected Large Scale Grids such as Desktop Grids, nodes and networks failures are not rare events. This paper provides several contributions, examining the feasibility and limits of fault-tolerant RPC on these platforms. First, we characterize these Grids from their fundamental features and demonstrate that their applications scope should be safely restricted to stateless services. Second, we present a new fault-tolerant RPC protocol associating an original combination of three-tier architecture, passive replication and message logging. We describe RPC-V, an implementation of the proposed protocol within the XtremWeb Desktop Grid middleware. Third, we evaluate the performance of RPC-V and the impact of faults on the execution time, using a real life application on a Desktop Grid testbed assembling nodes in France and USA. We demonstrate that RPC-V allows the applications to continue their execution while key system components fail.