Efficient dispersal of information for security, load balancing, and fault tolerance
Journal of the ACM (JACM)
IEEE Transactions on Parallel and Distributed Systems
Coding for High Availability of a Distributed-Parallel Storage System
IEEE Transactions on Parallel and Distributed Systems
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Erasure Coding Vs. Replication: A Quantitative Comparison
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Managing Checkpoints for Parallel Programs
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Stable Checkpointing in Distributed Systems without Shared Disks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Grid Computing: Making the Global Infrastructure a Reality
Grid Computing: Making the Global Infrastructure a Reality
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Concurrency and Computation: Practice & Experience - Middleware for Grid Computing
Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems
MGC '05 Proceedings of the 3rd international workshop on Middleware for grid computing
SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
High availability in DHTs: erasure coding vs. replication
IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
Distributed data storage for opportunistic grids
Proceedings of the 3rd international Middleware doctoral symposium
Hamster: making grid middleware fault-tolerant
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Hi-index | 0.00 |
This article compares several strategies for storing checkpoint data from parallel applications in an opportunistic gridenvironment. In terms of computational overhead, storage overhead, and degree of fault tolerance, the authors evaluate the use ofreplication, parity information, and erasure coding. They use an object-oriented grid middleware solution called InteGrade to implementthese strategies and to perform the evaluation experiments.