Checking a Non-Byzantine FT Scheme against Byzantine Faults
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hi-index | 0.00 |
Cooperative applications are widely used, e.g. as parallel calculations or distributed information processing systems. Whereby such applications meet the users demand and offer a performance improvement, the susceptibility to faults of any used computer node is raised.Often a single fault may cause a complete application failure. On the other hand, we the redundancy in distributed systems can be utilized for fast fault detection and recovery. So, we followed an approach that is based on duplication of each application process to detect crashes and faulty functions of single computer nodes.We concentrate on two aspects of efficient fault-tolerance - fast fault detection and recovery without delaying the application progress significantly. The contribution of this work is first a new fault detecting protocol for duplicated processes. Secondly, we enhance a roll forward recovery scheme so that it is applicable to a set of cooperative processes in conformity to the communication protocol.