Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Efficient distributed recovery using message logging
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Recovery in distributed systems using optimistic message logging and check-pointing
Journal of Algorithms
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
ACM SIGOPS Operating Systems Review
An optimal algorithm for distributed snapshots with causal message ordering
Information Processing Letters
Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Understanding the message logging paradigm for masking process crashes
Understanding the message logging paradigm for masking process crashes
Byzantine generals in action: implementing fail-stop processors
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
The Cost of Recovery in Message Logging Protocols
IEEE Transactions on Knowledge and Data Engineering
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
Detection of Global State Predicates
WDAG '91 Proceedings of the 5th International Workshop on Distributed Algorithms
A message system supporting fault tolerance
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Towards a Communication Characterization Methodology for Parallel Applications
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Efficient algorithms for optimistic crash recovery
Distributed Computing
Detecting causal relationships in distributed computations: in search of the holy grail
Distributed Computing
Performance analysis of different checkpointing and recovery schemes using stochastic model
Journal of Parallel and Distributed Computing
Self-stabilizing algorithm for checkpointing in a distributed system
Journal of Parallel and Distributed Computing
Message fragment based causal message logging
Journal of Parallel and Distributed Computing
Future Generation Computer Systems
Hi-index | 0.00 |
Casual message-logging protocols have several attractive properties: they introduce no blocking, send no additional messages over those sent by the application, and never create orphans. Causal message logging, however, does require the casual effects of the deliveries of messages to be tracked. The information concerning causality tracking is piggybacked on application messages, and the amount of such information can become large.In this paper we study the cost of tracking causality in causal message-logging protocols. One can track causality as accurately as possible, but to do so requires piggybacking a considerable amount of additional information. One can reduce the amount of piggybacked information on each message by reducing the accuracy of causality tracking. But then, causal message logging may piggyback the reduced amount of information on more messages.We specify six different methods of tracking causality, each representing a natural choice based on the specification of causal message logging. We describe how these six methods can be implemented and compare them in terms of how large of a piggyback load they impose. This load depends on the application that is using causal message logging. We characterize some applications for which a given method has the smallest piggyback load, and study using simulation the size of the piggyback load for two different models of applications.