ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Adaptive checkpointing in Time Warp
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Effects of the checkpoint interval on time and space in time warp
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Comparative analysis of periodic state saving techniques in time warp simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
A case study in simulating PCS networks using Time Warp
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Event sensitive state saving in time warp parallel discrete event simulations
WSC '96 Proceedings of the 28th conference on Winter simulation
Operating systems (3rd ed.): internals and design principles
Operating systems (3rd ed.): internals and design principles
An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation
IEEE Transactions on Parallel and Distributed Systems
Exploiting model independence for parallel PCS network simulation
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
ROSS: a high-performance, low memory, modular time warp system
PADS '00 Proceedings of the fourteenth workshop on Parallel and distributed simulation
A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 33nd conference on Winter simulation
Estimating rollback overhead for optimism control in Time Warp
SS '95 Proceedings of the 28th Annual Simulation Symposium
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation
IEEE Transactions on Parallel and Distributed Systems
High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Multiprogrammed non-blocking checkpoints in support of optimistic simulation on myrinet clusters
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Checkpointing-and-Communication Library (CCL) is a recently developed software which implements CPU offloaded, non-blocking checkpointing functionalities in support of optimistic parallel simulation on myrinet clusters. This is achieved by exploiting data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. Re-synchronization between CPU and DMA activities must sometimes be employed for several reasons, such as the maintenance of data consistency, thus adding overhead to (otherwise CPU cost-free) non-blocking checkpoint operations. In this paper we present a detailed cost model for non-blocking checkpointing and derive a performance effective re-synchronization semantic which we call minimum cost re-synchronization. With this semantic, an occurrence of re-synchronization either commits an on-going DMA based checkpoint operation (causing suspension of CPU activities) or aborts the operation (with possible increase in the expected rollback cost due to a reduced amount of committed checkpoints) on the basis of a minimum overhead expectation evaluated through the cost model.