Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems

  • Authors:
  • James Brandt;Frank Chen;Vincent De Sapio;Ann Gentile;Jackson Mayo;Philippe Pébay;Diana Roe;David Thompson;Matthew Wong

  • Affiliations:
  • -;-;-;-;-;-;-;-;-

  • Venue:
  • CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

Accurate failure prediction in conjunction with efficient process migration facilities including some Cloud constructs can enable failure avoidance in large-scale high performance computing (HPC) platforms. In this work we demonstrate a prototype system that incorporates our probabilistic failure prediction system with virtualization mechanisms and techniques to provide a whole system approach to failure avoidance. This work utilizes a failure scenario based on a real-world HPC case study.