On the relevance of communication costs of rollback-recovery protocols
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Trade-offs in implementing causal message logging protocols
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
A protocol for causally ordered message delivery in mobile computing systems
Mobile Networks and Applications - Special issue on personal communications services
Efficient transparent application recovery in client-server information systems
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Support for Software Interrupts in Log-Based Rollback-Recovery
IEEE Transactions on Computers
Asynchronous recovery without using vector timestamps
Journal of Parallel and Distributed Computing
Performance Evaluation of Fault Tolerance for Parallel Applications in Networked Environments
ICPP '97 Proceedings of the international Conference on Parallel Processing
An Efficient Optimistic Message Logging Scheme for Recoverable Mobile Computing Systems
IEEE Transactions on Mobile Computing
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Why Optimistic Message Logging Has Not Been Used in Telecommunications Systems
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Selective Checkpointing and Rollbacks in Multithreaded Distributed Systems
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Improving Logging and Recovery Performance in Phoenix/App
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Recovery guarantees for Internet applications
ACM Transactions on Internet Technology (TOIT)
MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Live data center migration across WANs: a robust cooperative context aware approach
Proceedings of the 2007 SIGCOMM workshop on Internet network management
Coordinated checkpoint versus message log for fault tolerant MPI
International Journal of High Performance Computing and Networking
Journal of Parallel and Distributed Computing
Novel Crash Recovery Approach for Concurrent Failures in Cluster Federation
GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
International Journal of Parallel Programming
Team-Based Message Logging: Preliminary Results
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Unstoppable stateful PHP web services
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Garbage collection in a causal message logging protocol
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Post-failure recovery of MPI communication capability: Design and rationale
International Journal of High Performance Computing Applications
Hi-index | 0.01 |
Abstract: Message logging protocols are an integral part of a technique for implementing processes that can recover from crash failures. All message logging protocols require that, when recovery is complete, there be no orphan processes, which are surviving processes whose states are inconsistent with the recovered state of a crashed process. We give a precise specification of the consistency property "no orphan processes". From this specification, we describe how different existing classes of message logging protocols (namely optimistic, pessimistic, and a class that we call causal) implement this property. We then propose a set of metrics to evaluate the performance of message logging protocols, and characterize the protocols that are optimal with respect to these metrics. Finally, starting from a protocol that relies on causal delivery order, we show how to derive optimal causal protocols that tolerate f overlapping failures and recoveries for a parameter f:1/spl les/f/spl les/n.