Time and communication efficient consensus for crash failures

  • Authors:
  • Bogdan S. Chlebus;Dariusz R. Kowalski

  • Affiliations:
  • Department of Computer Science and Eng., UCDHSC, Denver, CO;Department of Computer Science, University of Liverpool, Liverpool, UK

  • Venue:
  • DISC'06 Proceedings of the 20th international conference on Distributed Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is about consensus solutions optimized simultaneously for the time and communication complexities. Synchronous message passing with processors prone to crashes is the computing environment. The number f of crashes can be arbitrary as long as it is smaller than the number n of processors in the system. As a building block to our consensus solutions, we consider the gossiping problem in which processors have input rumors and the goal of every processor is to learn all the rumors of the processors that have not crashed. We show that gossiping can be achieved by a deterministic algorithm working in ${{\mathcal O}}(\log^3 n)$ time and sending ${{\mathcal O}}(n\log^4 n)$ point-to-point messages. These results improve upon the best previously known deterministic solution of gossiping that operated in ${{\mathcal O}}(\log^2 n)$ time and generated ${{\mathcal O}}(n^{1+\varepsilon})$ messages, for any constant ε0. The efficient gossiping algorithm is applied to the problem of reaching consensus. In the Consensus problem, each processor starts with its input value and the goal is to have all processors agree on exactly one value among the inputs. First we develop a deterministic algorithm solving Consensus in ${{\mathcal O}}(n)$ time while sending ${{\mathcal O}}(n \log^5 n)$ messages. The best previously known algorithms solving Consensus in ${{\mathcal O}}(n)$ time had the message complexity bounded by ${{\mathcal O}}(n^{1+\varepsilon})$, for any constant ε0. Next we improve the Consensus solution so that it is early stopping, which means that it terminates in ${{\mathcal O}}(f+1)$ time, where f is the number of crashes in an execution, while preserving the message complexity ${{\mathcal O}}(n \log^5 n)$.