A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation

  • Authors:
  • Francesco Quaglia

  • Affiliations:
  • Univ. di Roma “La Sapienza,”, Roma, Italy

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2001

Quantified Score

Hi-index 0.01

Visualization

Abstract

Recent papers have shown that the performance of Time Warp simulators can be improved by appropriately selecting the positions of checkpoints, instead of taking them on a periodic basis. In this paper, we present a checkpointing technique in which the selection of the positions of checkpoints is based on a checkpointing-recovery cost model. Given the current state $S$, the model determines the convenience of recording $S$ as a checkpoint before the next event is executed. This is done by taking into account the position of the last taken checkpoint, the granularity (i.e., the execution time) of intermediate events, and using an estimate of the probability that $S$ will have to be restored due to rollback in the future of the execution. A synthetic benchmark in different configurations is used for evaluating and comparing this approach to classical periodic techniques. As a testing environment we used a cluster of PCs connected through a Myrinet switch coupled with a fast communication layer specifically designed to exploit the potential of this type of switch. The obtained results point out that our solution allows faster execution and, in some cases, exhibits the additional advantage that less memory is required for recording state vectors. This possibly contributes to further performance improvements when memory is a critical resource for the specific application. A performance study for the case of a cellular phone system simulation is finally reported to demonstrate the effectiveness of this solution for a real world application.