An Adaptive Programming Model for Fault-Tolerant Distributed Computing
IEEE Transactions on Dependable and Secure Computing
Journal of Parallel and Distributed Computing
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Exploiting partitioned synchrony to implement accurate failure detectors
International Journal of Critical Computer-Based Systems
Hi-index | 0.00 |
The capability of dynamically adapting to distinct run-time conditions is an important issue when designing distributed systems where negotiated quality of service (QoS) cannot always be delivered between processes. Providing fault-tolerance for such dynamic environments is a challenging task. Considering such a context, this paper proposes an adaptive model for fault-tolerant distributed computing. This model encompasses both the synchronous model (where there are time bounds on processing speed and message delay) and the asynchronous model (where there is no time bound). To illustrate what can be done in this model and how to use it, the consensus problem is taken as a benchmark problem. An implementation of the model is also described. This implementation relies on a negotiated quality of service (QoS) for channels, that can be timely or untimely. Moreover, the QoS of a channel can be lost during the execution (i.e., dynamically modified from timely to untimely), thereby adding uncertainty into the system.