Reliability mechanisms for SDD-1: a system for distributed databases
ACM Transactions on Database Systems (TODS)
Recovery Techniques for Database Systems
ACM Computing Surveys (CSUR)
Synchronization in Distributed Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Ethernet: distributed packet switching for local computer networks
Communications of the ACM
Operating Systems, An Advanced Course
Notes on Data Base Operating Systems
Operating Systems, An Advanced Course
SOSP '77 Proceedings of the sixth ACM symposium on Operating systems principles
SOSP '77 Proceedings of the sixth ACM symposium on Operating systems principles
Metric (Extended Abstract): A kernel instrumentation system for distributed environments
SOSP '77 Proceedings of the sixth ACM symposium on Operating systems principles
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
A message system supporting fault tolerance
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
MANAGEMENT OF OBJECT HISTORIES IN THE SWALLOW REPOSITORY
MANAGEMENT OF OBJECT HISTORIES IN THE SWALLOW REPOSITORY
RECOVERY OF THE SWALLOW REPOSITORY
RECOVERY OF THE SWALLOW REPOSITORY
Database concurrency control and recovery in local broadcast networks
Database concurrency control and recovery in local broadcast networks
ACM Computing Surveys (CSUR) - The MIT Press scientific computation series
UIO: a uniform I/O system interface for distributed systems
ACM Transactions on Computer Systems (TOCS)
A survey of process migration mechanisms
ACM SIGOPS Operating Systems Review
ACM Transactions on Computer Systems (TOCS)
Recovery in distributed systems using asynchronous message logging and checkpointing
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Preserving and using context information in interprocess communication
ACM Transactions on Computer Systems (TOCS)
Programming languages for distributed computing systems
ACM Computing Surveys (CSUR)
Efficient distributed recovery using message logging
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Demonic memory for process histories
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The cascade fault tolerance message system
CSC '89 Proceedings of the 17th conference on ACM Annual Computer Science Conference
Failure Transparency in Remote Procedure Calls
IEEE Transactions on Computers
Modeling of Hierarchical Distributed Systems with Fault-Tolerance
IEEE Transactions on Software Engineering
Distributed, object-based programming systems
ACM Computing Surveys (CSUR)
Transparent optimistic rollback recovery
ACM SIGOPS Operating Systems Review
Lightweight causal and atomic group multicast
ACM Transactions on Computer Systems (TOCS)
An abstract model of rollback recovery control in distributed systems
ACM SIGOPS Operating Systems Review
A checkpointing recovery approach in a distributed system on the CSMA/CD network
SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems.
IEEE Transactions on Parallel and Distributed Systems
Distributed process groups in the V Kernel
ACM Transactions on Computer Systems (TOCS)
Hypervisor-based fault tolerance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
On the relevance of communication costs of rollback-recovery protocols
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Trade-offs in implementing causal message logging protocols
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Optimistic Crash Recovery without Changing Application Messages
IEEE Transactions on Parallel and Distributed Systems
Support for Software Interrupts in Log-Based Rollback-Recovery
IEEE Transactions on Computers
Fast cluster failover using virtual memory-mapped communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Transparent optimistic rollback recovery
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Efficient Garbage Collection Schemes for Causal Message Logging with Independent Checkpointing
The Journal of Supercomputing
Rollback Recovery in Distributed Systems Using Loosely Synchronized Clocks
IEEE Transactions on Parallel and Distributed Systems
Repeated Computation of Global Functions in a Distributed Environment
IEEE Transactions on Parallel and Distributed Systems
Efficient Rollback-Recovery Technique in Distributed Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
Fault-Tolerant Parallel Applications Using Queues and Actions
ICPP '97 Proceedings of the international Conference on Parallel Processing
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
Garbage collection in message passing distributed systems
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Minimizing timestamp size for completely asynchronous optimistic recovery with minimal rollback
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
Completely Asynchronous Optimistic Recovery with Minimal Rollbacks
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Why Optimistic Message Logging Has Not Been Used in Telecommunications Systems
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Reduced Overhead Logging for Rollback Recovery in Distributed Shared Memory
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Causality tracking in causal message-logging protocols
Distributed Computing
Efficient algorithms for optimistic crash recovery
Distributed Computing
Why use a fishing line when you have a net? an adaptive multicast data distribution protocol
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
2-step algorithm for enhancing effectiveness of sender-based message logging
SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
Novel log management for sender-based message logging
ICAI'08 Proceedings of the 9th WSEAS International Conference on International Conference on Automation and Information
WSEAS Transactions on Computers
Practical and low-overhead masking of failures of TCP-based servers
ACM Transactions on Computer Systems (TOCS)
A novel low-overhead recovery approach for distributed systems
Journal of Computer Systems, Networks, and Communications
Message fragment based causal message logging
Journal of Parallel and Distributed Computing
Checkpointing and communication pattern-neutral algorithm for removing messages logged by senders
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
An efficient algorithm for removing useless logged messages in SBML protocols
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Hi-index | 0.01 |
Publishing is a model and mechanism for crash recovery in a distributed computing environment. Published communication works for systems connected via a broadcast medium by recording messages transmitted over the network. The recovery mechanism can be completely transparent to the failed process and all processes interacting with it. Although published communication is intended for a broadcast network such as a bus, a ring, or an Ethernet, it can be used in other environments. A recorder reliably stores all messages that are transmitted, as well as checkpoint and recovery information. When it detects a failure, the recorder may restart affected processes from checkpoints. The recorder subsequently resends to each process all messages which were sent to it since the time its checkpoint was taken, while ignoring duplicate messages sent by it. Message-based systems without shared memory can use published communications to recover groups of processes. Simulations show that at least 5 multi-user minicomputers can be supported on a standard Ethernet using a single recorder. The prototype version implemented in DEMOS/MP demonstrates that an error recovery can be transparent to user processes and can be centralized in the network.