Efficient and flexible fault tolerance and migration of scientific simulations using CUMULVS
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
IEEE Transactions on Parallel and Distributed Systems
Computing in the RAIN: A Reliable Array of Independent Nodes
IEEE Transactions on Parallel and Distributed Systems
CLIP: a checkpointing tool for message-passing parallel programs
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Towards energy-aware software-based fault tolerance in real-time systems
Proceedings of the 2002 international symposium on Low power electronics and design
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive incremental checkpointing for massively parallel systems
Proceedings of the 18th annual international conference on Supercomputing
Application-level checkpointing for shared memory programs
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
IEEE Transactions on Dependable and Secure Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Optimizing Checkpoint Sizes in the C3 System
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Mobile MPI programs in computational grids
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Experimental evaluation of application-level checkpointing for OpenMP programs
Proceedings of the 20th annual international conference on Supercomputing
Transparent checkpoint-restart of multiple processes on commodity operating systems
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Fault-tolerant stream processing using a distributed, replicated file system
Proceedings of the VLDB Endowment
Proceedings of the 6th ACM conference on Computing frontiers
Transparent parallel checkpointing and migration in clusters and ClusterGrids
International Journal of Computational Science and Engineering
Towards an adaptive middleware for opportunistic environment: a mobile agent approach
Proceedings of the 7th International Workshop on Middleware for Grids, Clouds and e-Science
A fault-tolerant strategy for virtualized HPC clusters
The Journal of Supercomputing
An adaptive task-level fault-tolerant approach to Grid
The Journal of Supercomputing
Fast checkpoint recovery algorithms for frequently consistent applications
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Application-Level checkpointing techniques for parallel programs
ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
Distributed run-time resource management for malleable applications on many-core platforms
Proceedings of the 50th Annual Design Automation Conference
Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
International Journal of Parallel Programming
X10-FT: Transparent fault tolerance for APGAS language and runtime
Parallel Computing
Hi-index | 0.00 |