On the optimum checkpoint selection problem
SIAM Journal on Computing
On the Optimum Checkpoint Interval
Journal of the ACM (JACM)
Performance analysis of checkpointing strategies
ACM Transactions on Computer Systems (TOCS)
A first order approximation to the optimum checkpoint interval
Communications of the ACM
A Continuous-Parameter Markov Model and Detection Procedures for Intermittent Faults
IEEE Transactions on Computers
Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults
IEEE Transactions on Computers
Recovery through programming system/360: system/370
AFIPS '71 (Spring) Proceedings of the May 18-20, 1971, spring joint computer conference
AFIPS '72 (Fall, part I) Proceedings of the December 5-7, 1972, fall joint computer conference, part I
On switching policies for modular redundancy fault-tolerant computing systems
IEEE Transactions on Computers
Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems
IEEE Transactions on Computers - Fault-Tolerant Computing
Recovery Point Selection on a Reverse Binary Tree Task Model
IEEE Transactions on Software Engineering
A Bayesian approach to fault classification
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On the Optimal Total Processing Time Using Checkpoints
IEEE Transactions on Software Engineering
Automated micro-roll-back self-recovery synthesis
DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
Design and Analysis of an Optimal Instruction-Retry Policy for TMR Controller Computers
IEEE Transactions on Computers
Transient Fault Tolerance in Digital Systems
IEEE Micro
An Optimal Retry Policy Based on Fault Classification
IEEE Transactions on Computers
A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods
IEEE Transactions on Computers
Determination of an Optimal Retry Time in Multiple-Module Computing Systems
IEEE Transactions on Computers
Autonomous Agents: When the Mailbox Remains Empty
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Hi-index | 15.00 |
Recovery procedures involving time redundancy in the form of instruction retries and program rollbacks have proved to be very effective against transient failures in computer systems. A class of such recovery procedures is presented and analyzed here, and the parameters of each procedure are determined so that the system's operation is optimized. These procedures are then compared in order to select the most appropriate one for given system parameters.