Distributed Storage Layout Schemes
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems
MGC '05 Proceedings of the 3rd international workshop on Middleware for grid computing
Strategies for Checkpoint Storage on Opportunistic Grids
IEEE Distributed Systems Online
Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Construction of efficient or-based deletion-tolerant coding schemes
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Interacting processes in distributed systems save their checkpoints on local disks for efficiency reasons. But, because local checkpoints get unavailable with failing hosts, redundancy schemes similar to RAID-like storage schemes have to be used. In such systems,checkpoints are stable under a particular fault model because they can get reconstructed in the distributed system. In this paper, two variants of stable checkpoint storage will be compared, (i) parity grouping over local checkpoints and (ii) RAID-like distribution of each checkpoint using a software based distributed storage system. An analysis is given to compare costs for collective checkpoint creation, recovery of a single process and rollbackof all processes. The results show that despite of differences in detail, checkpointing using a distributed storage system is a reasonable solution.