Evaluating Distributed Checkpointing Protocol
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Models and Modeling Infrastructures for Global Computational Platforms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Fault tolerant high performance computing by a coding approach
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance analysis of different checkpointing and recovery schemes using stochastic model
Journal of Parallel and Distributed Computing
Cooperative checkpointing: a robust approach to large-scale systems reliability
Proceedings of the 20th annual international conference on Supercomputing
Model-based performance evaluation of distributed checkpointing protocols
Performance Evaluation
Coordinated checkpoint versus message log for fault tolerant MPI
International Journal of High Performance Computing and Networking
Using checkpointing to recover from poor multi-site parallel job scheduling decisions
Proceedings of the 5th international workshop on Middleware for grid computing: held at the ACM/IFIP/USENIX 8th International Middleware Conference
Distributed computing power service coordination based on peer-to-peer grids architecture
Expert Systems with Applications: An International Journal
Numerical computation algorithms for sequential checkpoint placement
Performance Evaluation
Cluster Performance Forecasting Using Predictive Modeling for Virtual Beowulf Clusters
ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Evaluating cooperative checkpointing for supercomputing systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cooperative checkpointing theory
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Fault-tolerant dynamic job scheduling policy
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Performance implications of failures in large-scale cluster scheduling
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
A checkpoint-on-failure protocol for algorithm-based recovery in standard MPI
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Comparing checkpoint and rollback recovery schemes in a cluster system
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Automatic identification of application I/O signatures from noisy server-side traces
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |