On the checkpointing strategy in desktop grids

Authors:
Dongping Wang;Bin Gong
Affiliations:
Department of Computer Science and Technology, ShanDong University, Jinan, China;Department of Computer Science and Technology, ShanDong University, Jinan, China
Venue:
IDCS'12 Proceedings of the 5th international conference on Internet and Distributed Computing Systems
Year:
2012

Citing 19
Cited 0

Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems

IEEE Transactions on Computers - Fault-Tolerant Computing
An On-Line Algorithm for Checkpoint Placement

IEEE Transactions on Computers
Performance Optimization of Checkpointing Schemes with Task Duplication

IEEE Transactions on Computers
Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme

IEEE Transactions on Computers
On the Optimum Checkpoint Interval

Journal of the ACM (JACM)
Performance analysis of checkpointing strategies

ACM Transactions on Computer Systems (TOCS)
Optimization criteria for checkpoint placement

Communications of the ACM
Optimization criteria for checkpoint placement

Communications of the ACM
Performance of rollback recovery systems under intermittent failures

Communications of the ACM
A first order approximation to the optimum checkpoint interval

Communications of the ACM
A Variational Calculus Approach to Optimal Checkpoint Placement

IEEE Transactions on Computers
A model of roll-back recovery with multiple checkpoints

ICSE '76 Proceedings of the 2nd international conference on Software engineering
BOINC: A System for Public-Resource Computing and Storage

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
A Survey of Analytic Models of Rollback and Recovery Stratergies

Computer
A higher order estimate of the optimum checkpoint interval for restart dumps

Future Generation Computer Systems
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
On the Scheduling of Checkpoints in Desktop Grids

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home

IEEE Transactions on Parallel and Distributed Systems
Modeling machine availability in enterprise and wide-area distributed computing environments

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Checkpointing is an effective measure to ensure the completion of long-running jobs in Desktop Grids which are subject to frequent resource failures. We focus on checkpointing strategies in the context of Desktop Grids, including volunteer computing systems, where individual hosts follow diverse failure distributions. We propose an algorithm which computes sequence of checkpoint interval lengths for each individual host according to a sample of its availability interval lengths. This algorithm directly approximates the probability distribution of availability interval lengths with the sample, without deriving a closed form of the probability distribution. Through simulations with synthetic trace data and trace data from real volunteer computing project, this sample based strategy shows better performance than periodic strategy in terms of wasted time in most cases.