Performance evaluation of parallel systems employing roll-forward checkpoint schemes

Authors:
Gyung-Leen Park;Hee Yong Youn;Junghoon Lee;Chul Soo Kim;Bongkyu Lee;Sang Joon Lee;Wang-Cheol Song;Yung-Cheol Byun
Affiliations:
Dept. of Computer Science and Statistics, Cheju National University, Cheju, Korea;School of Information and Communications Engineering, Sungkyunkwan University, Suwon, Korea;Dept. of Computer Science and Statistics, Cheju National University, Cheju, Korea;Dept. of Computer Science and Statistics, Cheju National University, Cheju, Korea;Dept. of Computer Science and Statistics, Cheju National University, Cheju, Korea;Faculty of Telecommunication and Computer Engineering, Cheju National University, Cheju, Korea;Faculty of Telecommunication and Computer Engineering, Cheju National University, Cheju, Korea;Faculty of Telecommunication and Computer Engineering, Cheju National University, Cheju, Korea
Venue:
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Year:
2006

Citing 5
Cited 0

Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems

IEEE Transactions on Parallel and Distributed Systems
Rollback-dependency trackability: a minimal characterization and its protocol

Information and Computation
The Cost of Recovery in Message Logging Protocols

IEEE Transactions on Knowledge and Data Engineering
Optimal Checkpoint Interval Analysis Using Stochastic Petri Net

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
A New Approach for High Performance Computing Systems with Various Checkpointing Schemes

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance and reliability are the main goals of parallel and distributed computing systems. To increase the performance and reliability of the systems, various checkpoint schemes have been proposed in the literature for decades. However, the lack of general analytical models has been an obstacle to compare the performance of systems employing different checkpoint schemes. This paper develops an analytical model to evaluate the relative response time of systems employing checkpoint schemes. The model has been applied to evaluate the relative response time of systems employing RFC (Roll-Forward Checkpoint), DMR-F (Double Modular Redundancy for Forward recovery), and DST (Duplex with Self-Test) schemes. The result shows the feasibility of the model developed in the paper.