The performance of independent checkpointing in distributed systems

  • Authors:
  • P. Sens

  • Affiliations:
  • -

  • Venue:
  • HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper describes performance measurements of an implementation of independent checkpointing in a network of workstations. Independent checkpointing is a simple technique for providing fault tolerance in distributed systems. Because processes do not coordinate during checkpointing, this technique has a low run-time overhead. To avoid the classical domino effect, our implementation relies on a message logging mechanism. We have measured fault management overhead for different kinds of parallel applications. The costs of checkpointing are very low. However, message logging introduces a sizeable overhead. We compare these results to other works implementing different checkpointing policies, and we show that independent checkpointing is an efficient way to provide fault tolerance for long-running distributed applications composed of processes exchanging small streams of data.