Chowkidar: Reliable and scalable health monitoring for wireless sensor network testbeds
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
SSS'11 Proceedings of the 13th international conference on Stabilization, safety, and security of distributed systems
When consensus meets self-stabilization
OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Self-stabilization of byzantine protocols
SSS'05 Proceedings of the 7th international conference on Self-Stabilizing Systems
Hi-index | 0.00 |
A reset of a distributed system is safe if it does not complete ``prematurely,'''' i.e., without having reset some process in the system. Safe resets are possible in the presence of certain faults, such as process fail-stops and repairs, but are not always possible in the presence of more general faults, such as arbitrary transients. In this paper, we design a bounded-memory distributed-reset program that possesses two tolerances: (1) in the presence of fail-stops and repairs, it always executes resets safely, and (2) in the presence of a finite number of transient faults, it eventually executes resets safely. Designing this multitolerance in the reset program introduces the novel concern of designing a safety detector that is itself multitolerant. A broad application of our multitolerant safety detector is to make any total program likewise multitolerant.