Fault Tolerant Wide-Area Parallel Computing
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Problem-Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Efficient task replication and management for adaptive fault tolerance in mobile Grid environments
Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
A task replication and fair resource management scheme for fault tolerant grids
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Integrating fault-tolerant techniques into the design of critical systems
ISARCS'10 Proceedings of the First international conference on Architecting Critical Systems
Hi-index | 0.02 |
As part of the Legion metacomputing project, we have developed a reflective model, the Reflective Graph & Event (RGE) model, for incorporating functionality into applications. In this paper we apply the RGE model to the problem of making applications more robust to failure. RGE encourages system developers to express fault-tolerance algorithms in terms of transformations on the data structures that represent computations--messages and methods--hence enabling the construction of generic and reusable fault-tolerance components. We illustrate the expressive power of the RGE by encapsulating the following fault-tolerance techniques into RGE components: two-phase commit distributed checkpointing, passive replication, pessimistic method logging, and forward recovery.