Low-Latency, Concurrent Checkpointing for Parallel Programs

  • Authors:
  • K. Li;J. F. Naughton;J. S. Plank

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1994

Quantified Score

Hi-index 0.01

Visualization

Abstract

Presents the results of an implementation of several algorithms for checkpointing andrestarting parallel programs on shared-memory multiprocessors. The algorithms arecompared according to the metrics of overall checkpointing time, overhead imposed bythe checkpointer on the target program, and amount of time during which thecheckpointer interrupts the target program. The best algorithm measured achieves itsefficiency through a variation of copy-on-write, which allows the most time-consumingoperations of the checkpoint to be overlapped with the running of the program beingcheckpointed.