Evaluating recovery aware components for grid reliability

Authors:
Iman I. Yusuf;Heinz W. Schmidt;Ian D. Peake
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Year:
2009

Citing 11
Cited 3

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Component Software: Beyond Object-Oriented Programming

Component Software: Beyond Object-Oriented Programming
A fault detection service for wide area distributed computations

Cluster Computing
Toward Systematic Design of Fault-Tolerant Systems

Computer
Trustworthy components-compositionality and prediction

Journal of Systems and Software - Special issue on: Component-based software engineering
Reliability prediction for component-based software architectures

Journal of Systems and Software - Special issue on: Software architecture - Engineering quality attributes
Software Rejuvenation: Analysis, Module and Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
An Agent Oriented Proactive Fault-Tolerant Framework for Grid Computing

E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A User-Oriented Software Reliability Model

IEEE Transactions on Software Engineering
A Framework for Proactive Fault Tolerance

ARES '08 Proceedings of the 2008 Third International Conference on Availability, Reliability and Security

Systematic simplicity-accuracy tradeoffs in parameterised contract models

Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
Architecture-based fault tolerance support for grid applications

Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings

Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically defensive and reactive, thus unnecessarily costly. In this paper we propose a hybrid FT approach, recovery aware component (RAC), combining reactive and proactive FT, with failure recovery or aversion of user-defined granularity, by component-orientation and architecture-level reasoning about FT, to increase reliability and availability without needless performance sacrifices. We model and analyse a parameterised RAC implementation combining prediction, proactive rejuvenation and reactive restarting to varying extents, calculating cost savings, reliability improvements and cost-benefit, under parameters such as prediction frequency and accuracy.