Performance Optimization of Checkpointing Schemes with Task Duplication

Authors:
Avi Ziv;Jehoshua Bruck
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1997

Citing 5
Cited 4

Elements of information theory

Elements of information theory
A case for two-level distributed recovery schemes

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Analysis and performance optimization of checkpointing schemes with task duplication

Analysis and performance optimization of checkpointing schemes with task duplication
Computer Networks

Computer Networks
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy

IEEE Transactions on Computers

Performance optimization for energy-aware adaptive checkpointing in embedded real-time systems

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Reliability of a job execution process using signatures

Mathematical and Computer Modelling: An International Journal
Optimal checkpointing intervals of three error detection schemes by a double modular redundancy

Mathematical and Computer Modelling: An International Journal
On the checkpointing strategy in desktop grids

IDCS'12 Proceedings of the 5th international conference on Internet and Distributed Computing Systems

Quantified Score

Hi-index	14.98

Visualization

Abstract

In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults by comparing the processors' states at checkpoints, and reducing fault recovery time by supplying a safe point to rollback to. In this paper, we show that, by tuning the checkpointing schemes to a given architecture, a significant reduction in the execution time can be achieved. The main idea is to use two types of checkpoints: compare-checkpoints (comparing the states of the redundant processes to detect faults) and store-checkpoints (storing the states to reduce recovery time). With two types of checkpoints, we can use both the comparison and storage operations in an efficient way and improve the performance of checkpointing schemes. Results we obtained show that, in some cases, using compare and store checkpoints can reduce the overhead of DMR checkpointing schemes by as much as 30 percent.