Transparent checkpoint-restart of multiple processes on commodity operating systems

  • Authors:
  • Oren Laadan;Jason Nieh

  • Affiliations:
  • Department of Computer Science, Columbia University;Department of Computer Science, Columbia University

  • Venue:
  • ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ability to checkpoint a running application and restart it later can provide many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing and improved service availability. However, applications often involve multiple processes which have dependencies through the operating system. We present a transparent mechanism for commodity operating systems that can checkpoint multiple processes in a consistent state so that they can be restarted correctly at a later time. We introduce an efficient algorithm for recording process relationships and correctly saving and restoring shared state in a manner that leverages existing operating system kernel functionality. We have implemented our system as a loadable kernel module and user-space utilities in Linux. We demonstrate its ability on real-world applications to provide transparent checkpoint-restart functionality without modifying, recompiling, or relinking applications, libraries, or the operating system kernel. Our results show checkpoint and restart times 3 to 55 times faster than OpenVZ and 5 to 1100 times faster than Xen.