The Do-All problem with Byzantine processor failures

  • Authors:
  • Antonio Fernández;Chryssis Georgiou;Alexander Russell;Alex A. Shvartsman

  • Affiliations:
  • GSyC, Universidad Rey Juan Carlos, Móstoles, Spain and University of Connecticut;Department of Computer Science, University of Cyprus, Nicosia, Cyprus;Department of Computer Science and Engineering, University of Connecticut, Storrs, CT;Department of Computer Science and Engineering, University of Connecticut, Storrs, CT and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, ...

  • Venue:
  • Theoretical Computer Science - Foundations of software science and computation structures
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Do-All is the abstract problem of using n processors to cooperatively perform m independent tasks in the presence of failures. This problem and its derivatives have been a centerpiece in the study of trade-offs between efficiency and fault-tolerance in cooperative computing environments. Many algorithms have been developed for Do-All in various models of computation, including message-passing, partitionable networks, and shared-memory models under a variety of failure models.This work initiates the study of the Do-All problem for synchronous message-passing processors prone to Byzantine failures. In particular, upper and lower bounds are given on the complexity of Do-All for several cases: (a) the case where the maximum number of faulty processors f is known a priori, (b) the case where f is not known, (c) the case where a task execution can be verified (without re-executing the task), and (d) the case where task executions cannot be verified. The efficiency of algorithms is evaluated in terms of work and message complexities. The work complexity accounts for all computational steps taken by the processors and the message complexity accounts for all messages sent by the processors during the computation. The work and messages of a faulty processor are counted only until the processor fails to follow the algorithm. It is shown that in some cases obtaining work Θ(mn) is the best one can do. It is also shown that in certain cases communication cannot help improve work efficiency.