Modular redundancy in a message passing system
IEEE Transactions on Software Engineering
Determining the last process to fail
ACM Transactions on Computer Systems (TOCS)
Replicated distributed programs
Proceedings of the tenth ACM symposium on Operating systems principles
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Communicating sequential processes
Communications of the ACM
Ethernet: distributed packet switching for local computer networks
Communications of the ACM
The notions of consistency and predicate locks in a database system
Communications of the ACM
Notes on Data Base Operating Systems
Operating Systems, An Advanced Course
The LOCUS distributed operating system
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
A principle for resilient sharing of distributed resources
ICSE '76 Proceedings of the 2nd international conference on Software engineering
Reliable Communication in the Presence of Failures
Reliable Communication in the Presence of Failures
Replicated invocations in wide-area systems
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
An algorithm for Supporting Fault Tolerant Objects in Distributed Object-Oriented Operating Systems
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Hi-index | 0.00 |
An object is said to be resilient if operations on the object can be performed even if some nodes of the network fail. To support resiliency, copies of the objects are stored on different nodes, and access to different copies is coordinated. The properties of broadcast networks are utilized to devise a distributed scheme for implementing resilient objects. All the copies of an object are equivalent. If an operation is requested on an object, the operation is performed on all the copies of the object. No special mechanisms are needed if some copies are not available due to node failures, as long as there is at least one active node that has a copy of the object and the network does not get partitioned. Simulation results indicate that the number of messages needed to perform an operation increases slowly and the response time for performing an operation decreases as the number of copies increases.