Selecting the checkpoint interval in time warp simulation
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Adaptive checkpointing in Time Warp
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Effect of communication overheads on Time Warp performance: an experimental study
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Effects of the checkpoint interval on time and space in time warp
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Comparative analysis of periodic state saving techniques in time warp simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Event sensitive state saving in time warp parallel discrete event simulations
WSC '96 Proceedings of the 28th conference on Winter simulation
Semi-asynchronous checkpointing for optimistic simulation on a Myrinet based NOW
Proceedings of the fifteenth workshop on Parallel and distributed simulation
A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation
IEEE Transactions on Parallel and Distributed Systems
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Multiprogrammed non-blocking checkpoints in support of optimistic simulation on myrinet clusters
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Recently, a Checkpointing and Communication Library (CCL) to support optimistic parallel simulation on myrinet based clusters has been presented. Beyond classical low latency message delivery functionalities, this library additionally offers CPU offloaded checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. A re-synchronization functionality is also supported for both logical (i.e. data consistency) and practical (i.e. hardware contention) reasons, which is implemented according to the following semantic: at any re-synchronization point, the simulation application is momentarily frozen until the last activated DMA based checkpoint operation is completed. In case long freezing periods are experienced, the checkpointing functionalities offered by CCL might not be fully effective in reducing the real checkpointing overhead at the simulation application level. To tackle this drawback, we present an alternative semantic for re-synchronization, namely conditional checkpoint abort, leading to application freezing only in case at least a threshold fraction of the state vector currently being checkpointed has already been transferred into the checkpoint buffer. In the opposite case, the checkpoint operation is aborted and the simulation application is immediately allowed to proceed, thus avoiding excessive checkpointing overhead (due to freezing) at the simulation application level. We also report the results of an evaluation, carried out using classical parameterized synthetic benchmarks, which show that the execution speed of the simulation application can be significantly increased by the alternative semantic we propose.