VMCAI'11 Proceedings of the 12th international conference on Verification, model checking, and abstract interpretation
A global snapshot collection algorithm with concurrent initiators with non-FIFO channel
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Efficient distributed snapshots in an anonymous asynchronous message-passing system
Journal of Parallel and Distributed Computing
Detecting stable locality-aware predicates
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Large-scale distributed systems such as supercomputers and peer-to-peer systems typically have a fully connected logical topology over a large number of processors. Existing snapshot algorithms in such systems have high response time and/or require a large number of messages, typically O(n^2), where n is the number of processes. In this paper, we present a suite of two algorithms: simple_tree, and hypercube, that are both fast and require a small number of messages. This makes the algorithms highly scalable. Simple_tree requires O(n) messages and has O(\logn) response time. Hypercube requires O(n \log n) messages and has O(\log n) response time, in addition to having the property that the roles of all the processes are symmetrical. Process symmetry implies greater potential for balanced workload and congestion-freedom. All the algorithms assume non-FIFO channels.