ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Design and Evaluation of the Rollback Chip: Special Purpose Hardware for Time Warp
IEEE Transactions on Computers
Adaptive checkpointing in Time Warp
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Effects of the checkpoint interval on time and space in time warp
ACM Transactions on Modeling and Computer Simulation (TOMACS)
The treatment of state in optimistic systems
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Comparative analysis of periodic state saving techniques in time warp simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
A case study in simulating PCS networks using Time Warp
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Transparent incremental state saving in time warp parallel discrete event simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Automatic incremental state saving
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Event sensitive state saving in time warp parallel discrete event simulations
WSC '96 Proceedings of the 28th conference on Winter simulation
Incremental state saving in SPEEDES using C++
WSC '93 Proceedings of the 25th conference on Winter simulation
State saving for interactive optimistic simulation
Proceedings of the eleventh workshop on Parallel and distributed simulation
An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation
IEEE Transactions on Parallel and Distributed Systems
Exploiting model independence for parallel PCS network simulation
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
ROSS: a high-performance, low memory, modular time warp system
PADS '00 Proceedings of the fourteenth workshop on Parallel and distributed simulation
Conditional checkpoint abort: an alternative semantic for re-synchronization in CCL
Proceedings of the sixteenth workshop on Parallel and distributed simulation
Low-Latency, Concurrent Checkpointing for Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation
IEEE Transactions on Parallel and Distributed Systems
PCI-DMA/CPU Handoff for Increased Effectiveness of Checkpointing Functionalities in CCL
DS-RT '03 Proceedings of the Seventh IEEE International Symposium on Distributed Simulation and Real-Time Applications
Journal of Parallel and Distributed Computing
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
IEEE Transactions on Wireless Communications
Hi-index | 0.00 |
CCL (checkpointing and communication library) is a software layer in support of optimistic parallel discrete event simulation (PDES) on myrinet-based COTS clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, non-blocking (asynchronous) checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. These functionalities are unique since optimistic simulation systems conventionally rely on checkpointing implemented as a synchronous, CPU-based data copy. Releases of CCL up to v2.4 only support monoprogrammed non-blocking checkpoints. This forces re-synchronization between CPU and DMA activities, which is a potential source of overhead, each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present a redesigned release of CCL (v3.0) that, exploiting hardware capabilities of more advanced myrinet clusters, supports multiprogrammed non-blocking checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with benefits on performance. We also report the results of the experimental evaluation of those benefits for the case of a Personal Communication System (PCS) simulation application, selected as a real world test-bed.