Concurrency control and recovery in database systems
Concurrency control and recovery in database systems
ACM Transactions on Computer Systems (TOCS)
A new approach to the maximum-flow problem
Journal of the ACM (JACM)
Design & analysis of fault tolerant digital systems
Design & analysis of fault tolerant digital systems
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Fault-Tolerant Array Processors Using Single-Track Switches
IEEE Transactions on Computers
Exploiting replication in distributed systems
Distributed systems
Communication support for reliable distributed computing
Fault-tolerant distributed computing
Replication in the harp file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Effect of Fault Tolerance on Response Time-Analysis of the Primary Site Approach
IEEE Transactions on Computers
Fault tolerance in distributed systems
Fault tolerance in distributed systems
Reliable Distributed Computing with the ISIS Toolkit
Reliable Distributed Computing with the ISIS Toolkit
Reconfiguring Processor Arrays Using Multiple-Track Models: The 3Track-Spare-Approach
IEEE Transactions on Computers
Deriving Optimal Checkpoint Protocols for Distributed Shared Memory Architectures
Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Hi-index | 0.01 |
In this paper, we present new results in the area of reconfiguration of stateful interactive processes in the presence of faults. More precisely, we consider a set of servers/processes that have the same functionality, i.e., are able to perform the same tasks and provide the same set of services to their clients. In the case when several of them turn out to be faulty, we want to reconfigure the system so that the clients of the faulty servers/processes are served by some other, fault-free, servers of the system in a way that is transparent to all the system clients. We propose a new method for reconfiguring in the presence of faults: compensation paths. Compensation paths are an efficient way of shifting spare resources from where they are available to where they are needed. We also present optimal and suboptimal simple reconfiguration algorithms of low polynomial time complexity O(nmlog(n2/m)) for the optimal and O(m) for the suboptimal algorithms, where n is the number of processes and m is the number of primary-backup relationships. The optimal algorithms compute the way to reconfigure the system whenever the reconfiguration is possible. The suboptimal algorithms may sometimes fail to reconfigure the system, although reconfiguration would be possible by using the optimal centralized algorithms. However, suboptimal algorithms have other competitive advantages over the centralized optimal algorithms with regard to time complexity and communication overhead.