Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Rollback sometimes works...if filtered
WSC '89 Proceedings of the 21st conference on Winter simulation
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
An algorithm for minimally latent global virtual time
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
The local Time Warp approach to parallel simulation
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Time Warp simulation in time constrained systems
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Comparative analysis of periodic state saving techniques in time warp simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Clustered time warp and logic simulation
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Automatic incremental state saving
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
How to recover efficiently and asynchronously when optimism fails
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Distributed Recovery with K-Optimistic Logging
ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
Fault-tolerant distributed simulation
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Optimistic parallel simulation over a network of workstations
Proceedings of the 31st conference on Winter simulation: Simulation---a bridge to the future - Volume 2
Hi-index | 0.00 |
In traditional optimistic distributed simulation protocols, a logical process (LP) receiving a straggler rolls back and sends out anti-messages. The receiver of an anti-message may also roll back and send out more anti-messages. So a single straggler may result in a large number of anti-messages and multiple rollbacks of some LPs. In the authors' protocol, an LP receiving a straggler broadcasts its rollback. On receiving this announcement, other LPs may roll back but they do not announce their rollbacks. So each LP rolls back at most once in response to each straggler. Anti-messages are not used. This eliminates the need for output queues and results in simple memory management. It also eliminates the problem of cascading rollbacks and echoing, and results in faster simulation. All this is achieved by a scheme for maintaining transitive dependency information. The cost incurred includes the tagging of each message with extra dependency information and the increased processing time upon receiving a message. They also present the similarities between the two areas of distributed simulation and distributed recovery. They show how the solutions for one area can be applied to the other area.