Implementing fault-tolerant services using state machines: beyond replication
DISC'10 Proceedings of the 24th international conference on Distributed computing
Fused state machines for fault tolerance in distributed systems
OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Hi-index | 0.00 |
Given a set of n different deterministic finite state machines (DFSMs) modeling a distributed system, we examine the problem of tolerating f crash or Byzantine faults in such a system. The traditional approach to this problem involves replication and requires n · f backup DFSMs for crash faults and 2 · n · f backup DFSMs for Byzantine faults. For example, to tolerate two crash faults in three DFSMs, a replication based technique needs two copies of each of the given DFSMs, resulting in a system with six backup DFSMs. In this paper, we question the optimality of such an approach and present an approach called (f, m)-fusion that permits fewer backups than the replication based approaches. Given n different DFSMs, we examine the problem of tolerating f faults using just m additional DFSMs. We introduce the theory of fusion machines and provide an algorithm to generate backup DFSMs for both crash and Byzantine faults. We have implemented our algorithms in Java and have used them to automaticaly generate backup DFSMs for several examples.