On the optimum checkpoint selection problem
SIAM Journal on Computing
Optimal checkpointing of real-time tasks
IEEE Transactions on Computers
High-Availability Computer Systems
Computer
Software dependability in the operational phase
Software dependability in the operational phase
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Analysis of Preventive Maintenance in Transactions Based Software Systems
IEEE Transactions on Computers
Analysis and implementation of software rejuvenation in cluster systems
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Monitoring Smoothly Degrading Systems for Increased Dependability
Empirical Software Engineering
Availability analysis and improvement of active/standby cluster systems using software rejuvenation
Journal of Systems and Software
Software Reliability and Rejuvenation: Modeling and Analysis
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
NT-SwiFT: software implemented fault tolerance on Windows NT
Journal of Systems and Software
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Ensuring stable performance for systems that degrade
Proceedings of the 5th international workshop on Software and performance
Ensuring system performance for cluster and single server systems
Journal of Systems and Software
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Analysis of Restart Mechanisms in Software Systems
IEEE Transactions on Software Engineering
NT-SwiFT: software implemented fault tolerance on windows NT
WINSYM'98 Proceedings of the 2nd conference on USENIX Windows NT Symposium - Volume 2
Performance under failures of high-end computing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Performance under Failures of DAG-based Parallel Computing
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Proactive management of software aging
IBM Journal of Research and Development
Analysis of a software system with rejuvenation, restoration and checkpointing
ISAS'08 Proceedings of the 5th international conference on Service availability
Methods and opportunities for rejuvenation in aging distributed software systems
Journal of Systems and Software
Journal of Systems and Software
Automatic workarounds for web applications
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Job completion time on a virtualized server with software rejuvenation
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Hi-index | 0.00 |
Checkpointing with rollback-recovery is a well known technique to reduce the completion time of a program in the presence of failures. While checkpointing is corrective in nature, rejuvenation refers to preventive maintenance of software aimed to reduce unexpected failures mostly resulting from the "aging" phenomenon. In this paper, we show how both these techniques may be used together to further reduce the expected completion time of a program. The idea of using checkpoints to reduce the amount of rollback upon a failure is taken a step further by combining it with rejuvenation. We derive the equations for expected completion time of a program with finite failure free running time for the following three cases when; (a) neither checkpointing nor rejuvenation is employed, (b) only checkpointing is employed, and finally (c) both checkpointing and rejuvenation are employed.We also present numerical results for Weibull failure time distribution for the above three cases and discuss optimal checkpointing and rejuvenation that minimizes the expected completion time. Using the numerical results, some interesting conclusions are drawn about benefits of these techniques in relation to the nature of failure distribution.