Evaluating recovery aware components for grid reliability

  • Authors:
  • Iman I. Yusuf;Heinz W. Schmidt;Ian D. Peake

  • Affiliations:
  • RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia

  • Venue:
  • Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically defensive and reactive, thus unnecessarily costly. In this paper we propose a hybrid FT approach, recovery aware component (RAC), combining reactive and proactive FT, with failure recovery or aversion of user-defined granularity, by component-orientation and architecture-level reasoning about FT, to increase reliability and availability without needless performance sacrifices. We model and analyse a parameterised RAC implementation combining prediction, proactive rejuvenation and reactive restarting to varying extents, calculating cost savings, reliability improvements and cost-benefit, under parameters such as prediction frequency and accuracy.