Failure detection and consensus in the crash-recovery model
Distributed Computing
Hi-index | 0.00 |
Problem Setting. One of the most popular failure models for asynchronous fault-tolerant distributed systems is called crash-stop, which allows that a certain number of processes stops executing steps during the computation. Despite its theoretical interest, crash-stop is not expressive enough to model many realistic scenarios. In practice, processes crash but their processors reboot and the crashed process is restarted from a recovery point and rejoins the computation. This behavior is formalized as a failure model called crash-recovery, in which the processes can crash and recover multiple times.