Conditional checkpoint abort: an alternative semantic for re-synchronization in CCL

  • Authors:
  • Francesco Quaglia;Andrea Santoro;Bruno Ciciani

  • Affiliations:
  • Università di Roma "La Sapienza", Roma, Itály;Università di Roma "La Sapienza", Roma, Itály;Università di Roma "La Sapienza", Roma, Itály

  • Venue:
  • Proceedings of the sixteenth workshop on Parallel and distributed simulation
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, a Checkpointing and Communication Library (CCL) to support optimistic parallel simulation on myrinet based clusters has been presented. Beyond classical low latency message delivery functionalities, this library additionally offers CPU offloaded checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. A re-synchronization functionality is also supported for both logical (i.e. data consistency) and practical (i.e. hardware contention) reasons, which is implemented according to the following semantic: at any re-synchronization point, the simulation application is momentarily frozen until the last activated DMA based checkpoint operation is completed. In case long freezing periods are experienced, the checkpointing functionalities offered by CCL might not be fully effective in reducing the real checkpointing overhead at the simulation application level. To tackle this drawback, we present an alternative semantic for re-synchronization, namely conditional checkpoint abort, leading to application freezing only in case at least a threshold fraction of the state vector currently being checkpointed has already been transferred into the checkpoint buffer. In the opposite case, the checkpoint operation is aborted and the simulation application is immediately allowed to proceed, thus avoiding excessive checkpointing overhead (due to freezing) at the simulation application level. We also report the results of an evaluation, carried out using classical parameterized synthetic benchmarks, which show that the execution speed of the simulation application can be significantly increased by the alternative semantic we propose.