ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Time warp on a shared memory multiprocessor
Transactions of the Society for Computer Simulation International
Selecting the checkpoint interval in time warp simulation
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
An analytical comparison of periodic checkpointing and incremental state saving
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Adaptive checkpointing in Time Warp
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Effects of the checkpoint interval on time and space in time warp
ACM Transactions on Modeling and Computer Simulation (TOMACS)
The treatment of state in optimistic systems
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Comparative analysis of periodic state saving techniques in time warp simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
A case study in simulating PCS networks using Time Warp
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Transparent incremental state saving in time warp parallel discrete event simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Automatic incremental state saving
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Event sensitive state saving in time warp parallel discrete event simulations
WSC '96 Proceedings of the 28th conference on Winter simulation
Incremental state saving in SPEEDES using C++
WSC '93 Proceedings of the 25th conference on Winter simulation
Operating systems (3rd ed.): internals and design principles
Operating systems (3rd ed.): internals and design principles
Implementation of reductions in support of PDES on a network of workstations
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation
IEEE Transactions on Parallel and Distributed Systems
Exploiting model independence for parallel PCS network simulation
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
ROSS: a high-performance, low memory, modular time warp system
PADS '00 Proceedings of the fourteenth workshop on Parallel and distributed simulation
Practical parallel simulation applied to aviation modeling
Proceedings of the fifteenth workshop on Parallel and distributed simulation
A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation
IEEE Transactions on Parallel and Distributed Systems
Operating System Concepts, 4th Ed.
Operating System Concepts, 4th Ed.
Proceedings of the 33nd conference on Winter simulation
Low-Latency, Concurrent Checkpointing for Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Distributed Simulation of Large-Scale PCS Networks
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Estimating rollback overhead for optimism control in Time Warp
SS '95 Proceedings of the 28th Annual Simulation Symposium
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation
IEEE Transactions on Parallel and Distributed Systems
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Hi-index | 0.03 |
Checkpointing and Communication Library (CCL) is a recently developed software implementing CPU offloaded checkpointing functionalities in support of optimistic parallel simulation on myrinet clusters. Specifically, CCL implements a non-blocking execution mode of memory-to-memory data copy associated with checkpoint operations, based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. Re-synchronization between CPU and DMA activities must sometimes be employed for several reasons, such as maintenance of data consistency, thus adding some overhead to (otherwise CPU cost-free) non-blocking checkpoint operations. In this paper we present a cost model for non-blocking checkpointing and derive a performance effective re-synchronization semantic which we call minimum cost re-synchronization MC. With this semantic, an occurrence of re-synchronization either commits an on-going DMA based checkpoint operation (causing suspension of CPU activities) or aborts the operation (with possible increase in the expected rollback cost due to a reduced amount of committed checkpoints) on the basis of a minimum overhead expectation evaluated through the cost model. We have implemented MC within CCL, and we also report experimental results demonstrating the performance benefits from this optimized re-synchronization semantic, in terms of increase in the execution speed, for a Personal Communication System (PCS) simulation application.