Benefits of Software Rejuvenation on HPC Systems

  • Authors:
  • Nichamon Naksinehaboon;Narate Taerat;Chokchai Leangsuksun;Clayton F. Chandler;Stephen L. Scott

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ISPA '10 Proceedings of the International Symposium on Parallel and Distributed Processing with Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rejuvenation is a technique expected to mitigate failures in HPC systems by replacing, repairing, or resetting system components. Because of the small overhead required by software rejuvenation, we primarily focus on OS/kernel rejuvenation. In this paper, we propose three rejuvenation scheduling techniques. Moreover, we investigate the claim that software rejuvenation prolongs failure times in HPC systems. Also, we compare the lost computing times of the checkpoint/restart mechanism with and without rejuvenation after each checkpoint.