Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
Progressive Retry for Software Failure Recovery in Message-Passing Applications
IEEE Transactions on Computers
Fault-tolerant distributed simulation
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Hi-index | 0.01 |
Abstract: The paper describes a recovery technique called progressive retry for bypassing software faults in message passing applications. The technique is implemented as reusable modules to provide application-level software fault tolerance. The paper describes the implementation of the technique and presents results from the application of progressive retry to two telecommunications systems. The results presented show that the technique is helpful in reducing the total recovery time for message-passing applications.