Resiliency in exascale systems and computations using chaotic-identity maps

  • Authors:
  • Nageswara S. V. Rao

  • Affiliations:
  • Oak Ridge National Laboratory, Oak Ridge, TN

  • Venue:
  • Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

For exascale computing systems, we propose (i) light-weight computational modules that utilize chaotic computations and customized identity maps to detect component failures, and (ii) statistical estimation methods that generate robustness estimates for the system and computations based on the module outputs. The diagnosis modules execute multiple Poincare and identity maps, which are customized to detect certain classes of failures in the compute nodes and interconnects. We propose statistical methods that generate robustness estimates for the system using the outputs of pipelined chains of diagnosis modules.