CCL v3.0: Multiprogrammed Semi-Asynchronous Checkpoints

  • Authors:
  • Francesco Quaglia;Andrea Santoro

  • Affiliations:
  • Dipartimento di Informatica e Sistemistica, Università di Roma "La Sapienza", Via Salaria 113, 00198 Roma, Italy;Dipartimento di Informatica e Sistemistica, Università di Roma "La Sapienza", Via Salaria 113, 00198 Roma, Italy

  • Venue:
  • Proceedings of the seventeenth workshop on Parallel and distributed simulation
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

CCL (Checkpointing and Communication Library) is arecently developed software in support of optimistic parallelsimulation on myrinet based clusters. Beyond classicallow latency message delivery functionalities, this libraryimplements CPU offloaded, semi-asynchronous checkpointingfunctionalities based on data transfer capabilities providedby a programmable DMA engine on board of myrinetnetwork cards. The latest version of CCL (v2.4), designedfor M2M-PCI32C myrinet cards, only supports monoprogrammedsemi-asynchronous checkpoints. This forces resynchronizationbetween CPU and DMA activities each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semi-asynchronous checkpoints. The multiprogrammed approach allows higher degree of concurrencybetween checkpointing and other simulation specific operations carried out by the CPU, with obvious benefits onperformance. We also report the results of the evaluationof those benefits for the case of a personal communicationsystem simulation application.