A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation

Authors:
Francesco Quaglia
Affiliations:
Univ. di Roma “La Sapienza,”, Roma, Italy
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 20
Cited 16

Virtual time

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel discrete event simulation

Communications of the ACM - Special issue on simulation
Selecting the checkpoint interval in time warp simulation

PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
An analytical comparison of periodic checkpointing and incremental state saving

PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Adaptive checkpointing in Time Warp

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Effects of the checkpoint interval on time and space in time warp

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Comparative analysis of periodic state saving techniques in time warp simulators

PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Event sensitive state saving in time warp parallel discrete event simulations

WSC '96 Proceedings of the 28th conference on Winter simulation
Incremental state saving in SPEEDES using C++

WSC '93 Proceedings of the 25th conference on Winter simulation
An external state management system for optimistic parallel simulation

WSC '93 Proceedings of the 25th conference on Winter simulation
Multiplexed state saving for bounded rollback

Proceedings of the 29th conference on Winter simulation
State saving for interactive optimistic simulation

Proceedings of the eleventh workshop on Parallel and distributed simulation
Event history based sparse state saving in time warp

PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Formal verification and empirical analysis of rollback relaxation

Journal of Systems Architecture: the EUROMICRO Journal - Special double issue: parallel and distributed simulation
An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation

IEEE Transactions on Parallel and Distributed Systems
Combining periodic and probabilistic checkpointing in optimistic simulation

PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
Fast-software-checkpointing in optimistic simulation: embedding state saving into the event routine instructions

PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
Adaptive checkpoint intervals in an optimistically synchronised parallel digital system simulator

VLSI '93 Proceedings of the IFIP TC10/WG 10.5 International Conference on Very Large Scale Integration
Estimating rollback overhead for optimism control in Time Warp

SS '95 Proceedings of the 28th Annual Simulation Symposium
A comparative study of state saving mechanisms for time warp synchronized parallel discrete event simulation

SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)

Conditional checkpoint abort: an alternative semantic for re-synchronization in CCL

Proceedings of the sixteenth workshop on Parallel and distributed simulation
Communications and network: benefits from semi-asynchronous checkpointing for time warp simulations of a large state PCS model

Proceedings of the 33nd conference on Winter simulation
On the processor scheduling problem in time warp synchronization

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation

IEEE Transactions on Parallel and Distributed Systems
An overhead reducing technique for Time Warp

Journal of Parallel and Distributed Computing
Short note: Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

Journal of Parallel and Distributed Computing
A Version of MASM Portable Across Different UNIX Systems and Different Hardware Architectures

DS-RT '05 Proceedings of the 9th IEEE International Symposium on Distributed Simulation and Real-Time Applications
Transparent State Management for Optimistic Synchronization in the High Level Architecture

Simulation
A Lightweight Heuristic-based Mechanism for Collecting Committed Consistent Global States in Optimistic Simulation

DS-RT '07 Proceedings of the 11th IEEE International Symposium on Distributed Simulation and Real-Time Applications
DyMeLoR: Dynamic Memory Logger and Restorer Library for Optimistic Simulation Objects with Generic Memory Layout

Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation
Benchmarking Memory Management Capabilities within ROOT-Sim

DS-RT '09 Proceedings of the 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications
Time-parallel simulation of wireless ad hoc networks

Wireless Networks
An evolutionary algorithm to optimize log/restore operations within optimistic simulation platforms

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
Know thy simulation model: analyzing event interactions for probabilistic synchronization in parallel simulations

Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Cache-aware memory manager for optimistic simulations

Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recent papers have shown that the performance of Time Warp simulators can be improved by appropriately selecting the positions of checkpoints, instead of taking them on a periodic basis. In this paper, we present a checkpointing technique in which the selection of the positions of checkpoints is based on a checkpointing-recovery cost model. Given the current state $S$, the model determines the convenience of recording $S$ as a checkpoint before the next event is executed. This is done by taking into account the position of the last taken checkpoint, the granularity (i.e., the execution time) of intermediate events, and using an estimate of the probability that $S$ will have to be restored due to rollback in the future of the execution. A synthetic benchmark in different configurations is used for evaluating and comparing this approach to classical periodic techniques. As a testing environment we used a cluster of PCs connected through a Myrinet switch coupled with a fast communication layer specifically designed to exploit the potential of this type of switch. The obtained results point out that our solution allows faster execution and, in some cases, exhibits the additional advantage that less memory is required for recording state vectors. This possibly contributes to further performance improvements when memory is a critical resource for the specific application. A performance study for the case of a cellular phone system simulation is finally reported to demonstrate the effectiveness of this solution for a real world application.