Minimizing completion time of a program by checkpointing and rejuvenation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimal software rejuvenation for tolerating soft failures
Performance Evaluation
Progressive Retry for Software Failure Recovery in Message-Passing Applications
IEEE Transactions on Computers
Analysis of Preventive Maintenance in Transactions Based Software Systems
IEEE Transactions on Computers
High Performance Cluster Computing: Architectures and Systems
High Performance Cluster Computing: Architectures and Systems
Software Dependability in the Tandem GUARDIAN System
IEEE Transactions on Software Engineering
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Self-configuring algorithm for software fault tolerance in (n,k)-way cluster systems
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
A new availability concept for (n,k)-way cluster systems regarding waiting time
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
A dependability management mechanism for ubiquitous computing systems
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Hi-index | 0.00 |
Cluster systems, using commercially available personal computers connected in a loosely coupled fashion can provide high levels of availability. To improve the availability of personal computer-based Active/Standby cluster systems, we have conducted a study of software rejuvenation that follows a proactive fault-tolerant approach to handle software-origin system failure. In this paper, we map software rejuvenation and switchover states with a semi-Markov process and get mathematical steady-state solutions of the chain. We calculate the availability and the downtime of Active/Standby cluster systems using the solutions and find that software rejuvenation can be used to improve the availability of Active/Standby cluster systems.