Architecture-based fault tolerance support for grid applications

  • Authors:
  • Iman I. Yusuf;Heinz W. Schmidt;Ian D. Peake

  • Affiliations:
  • RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia

  • Venue:
  • Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Failure in long running grid applications is arguably inevitable and costly. Therefore, fault tolerance (FT) support for grid applications is needed. This paper evaluates an extension of our prior work on Recovery Aware Components (RAC), a component based FT approach. Our extension utilizes the grid application architecture according to a small number of architectural classes. In this paper, we evaluate the MapReduce architecture only and analyze the reliability improvement MapReduce applications would gain by adopting the RAC approach. Our analysis shows that significant increases in reliability are possible at moderate extra cost. Obviously the cost of FT depends on the failure rate of the managed system, i.e., the system to be protected from faults, and the FT strategy chosen. Our work aims to give High Performance Computing (HPC) software architects the tools to control these factors for dierent grid application architectures.