Performing tasks on synchronous restartable message-passing processors

  • Authors:
  • Bogdan S. Chlebus;Roberto De Prisco;Alex A. Shvartsman

  • Affiliations:
  • Instytut Informatyki, Uniwersytet Warszawski, ul. Banacha 2, 02-097 Warszawa, Poland;Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, NE43-316 Cambridge, MA and Dipartimento di Informatica ed Applicazioni, University of Salerno, 84081 ...;Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, NE43-316 Cambridge, MA and Department of Computer Science and Engineering, University of Connecticut, ...

  • Venue:
  • Distributed Computing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work considers the problem of performing t tasks in a distributed system of p fault-prone processors. This problem, called DO-ALL herein, was introduced by Dwork, Halpern and Waarts. The solutions presented here are for the model of computation that abstracts a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f p stop-failures and does not allow restarts. Its available processor steps (work) complexity is S = O((t+ p logp/log log p) ċ log f) and its message complexity is M = O(t + plogp/ log logp +fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f, it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stop-failures and restarts. This new algorithm is the first solution for the DO-ALL problem that efficiently deals with processor restarts. Its available processor steps is S = O((t + plogp + f. min{log p, logf}), and its message complexity is M = O(t + plogp + fp), where f is the total number of failures.