Application-transparent checkpointing in Mach 3.O/UX

  • Authors:
  • M. Russinovich;Z. Segall

  • Affiliations:
  • -;-

  • Venue:
  • HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Checkpointing is perhaps the most explored of software based recovery techniques, yet it has typically been developed only for special purpose or research oriented operating systems. The paper presents virtual memory checkpointing algorithms that have been designed for concurrent Unix applications using a hard disk as the stable storage medium. These algorithms can serve as the checkpointing support required on each node of a distributed computation made up of concurrent processes running on each node. Snapshot algorithm execution, during which the application is suspended, typically is less than 10 seconds. Checkpoint commit execution, during which system performance is degraded as a checkpoint is written to disk, is less than 45 seconds. The checkpoint dedicated disk storage requirement for the implemented system is less than 10 MB. The implementation is based on the Mach 3.O/UX version of Unix 4.3BSD and uses Mach 3.0's external pager facility to back memory.