Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Information Processing Letters
A message-optimal algorithm for distributed termination detection
Journal of Parallel and Distributed Computing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Observing Global States of Asynchronous Distributed Applications
Proceedings of the 3rd International Workshop on Distributed Algorithms
Scalable algorithms for global snapshots in distributed systems
Proceedings of the 20th annual international conference on Supercomputing
Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
An efficient and scalable checkpointing and recovery algorithm for distributed systems
ICDCN'06 Proceedings of the 8th international conference on Distributed Computing and Networking
Hi-index | 0.00 |
Taking a global snapshot in the absence of a global clock is a challenging issue in distributed system. The problem becomes more challenging when the communication channel is a non-FIFO one, due to the lack of FIFO properties in transmitting messages. Multiple initiators further complicate the situation. In this paper, we present a global snapshot collection algorithm with multiple initiators in the case of non-FIFO communication channel. We have shown that the algorithm can take a unique global consistent snapshot with non-FIFO channel, and terminates in O(mn2) message complexity where m is the number of concurrent initiators, and n is the number of processes in the system.