On verifying fault tolerance of distributed protocols

  • Authors:
  • Dana Fisman;Orna Kupferman;Yoad Lustig

  • Affiliations:
  • School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel and IBM Haifa Research Lab, Haifa, Israel;School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel;School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel

  • Venue:
  • TACAS'08/ETAPS'08 Proceedings of the Theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed systems are composed of processes connected in some network. Distributed systems may suffer from faults: processes may stop, be interrupted, or be maliciously attacked. Fault-tolerant protocols are designed to be resistant to faults. Proving the resistance of protocols to faults is a very challenging problem, as it combines the parameterized setting that distributed systems are based-on, with the need to consider a hostile environment that produces the faults. Considering all the possible fault scenarios for a protocol is very difficult. Thus, reasoning about fault-tolerance protocols utterly needs formal methods. In this paper we describe a framework for verifying the fault tolerance of (synchronous or asynchronous) distributed protocols. In addition to the description of the protocol and the desired behavior, the user provides the fault type (e.g., failstop, Byzantine) and its distribution (e.g., at most half of the processes are faulty). Our framework is based on augmenting the description of the configurations of the system by a mask describing which processes are faulty. We focus on regular model checking and show how it is possible to compile the input for the model-checking problem to one that takes the faults and their distribution into an account, and perform regular model-checking on the compiled input. We demonstrate the effectiveness of our framework and argue for its generality.