The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Component Software: Beyond Object-Oriented Programming
Component Software: Beyond Object-Oriented Programming
A fault detection service for wide area distributed computations
Cluster Computing
Trustworthy components-compositionality and prediction
Journal of Systems and Software - Special issue on: Component-based software engineering
Reliability prediction for component-based software architectures
Journal of Systems and Software - Special issue on: Software architecture - Engineering quality attributes
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
An Agent Oriented Proactive Fault-Tolerant Framework for Grid Computing
E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A User-Oriented Software Reliability Model
IEEE Transactions on Software Engineering
A Framework for Proactive Fault Tolerance
ARES '08 Proceedings of the 2008 Third International Conference on Availability, Reliability and Security
Systematic simplicity-accuracy tradeoffs in parameterised contract models
Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
Architecture-based fault tolerance support for grid applications
Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering
Hi-index | 0.00 |
Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically defensive and reactive, thus unnecessarily costly. In this paper we propose a hybrid FT approach, recovery aware component (RAC), combining reactive and proactive FT, with failure recovery or aversion of user-defined granularity, by component-orientation and architecture-level reasoning about FT, to increase reliability and availability without needless performance sacrifices. We model and analyse a parameterised RAC implementation combining prediction, proactive rejuvenation and reactive restarting to varying extents, calculating cost savings, reliability improvements and cost-benefit, under parameters such as prediction frequency and accuracy.