Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery

  • Authors:
  • Kazuya Yamakita;Hiroshi Yamada;Kenji Kono

  • Affiliations:
  • Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan;Keio University, JST CREST, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan;Keio University, JST CREST, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan

  • Venue:
  • DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents "phase-based" reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3 to 93.6% shorter than that of the normal reboot-based recovery.