Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Optimistic Recovery in Multi-Threaded Distributed Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
A low-overhead recovery technique using quasi-synchronous checkpointing
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Selective Checkpointing and Rollbacks in Multithreaded Distributed Systems
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Hi-index | 0.00 |
Checkpointing and recovery in traditional distributed systems is relatively well established. However, checkpointing and recovery in multithreaded distributed systems has not been studied in the literature. Using the traditional checkpointing and recovery algorithms in multithreaded systems leads to false causality problem and high checkpointing overhead. The checkpointing algorithm is implemented at the process level to reduce number of checkpoints and the recovery algorithm is implemented at the thread level which minimizes the false causality problem. The algorithm also takes advantage of the communication-induced checkpointing method to reduce the message overhead.