A framework for the design of dependent-failure algorithms: Research Articles

  • Authors:
  • Flavio Junqueira;Keith Marzullo

  • Affiliations:
  • Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093, U.S.A.;Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093, U.S.A.

  • Venue:
  • Concurrency and Computation: Practice & Experience - Parallel and Distributed Computing (EuroPar 2005)
  • Year:
  • 2007

Quantified Score

Hi-index 0.02

Visualization

Abstract

Dependent failures constitute a real problem in distributedsystems. In this paper, we present a framework for the design ofdistributed algorithms for systems in which failures of processesare not necessarily independent or identically distributed. Toderive this framework, we revisit the traditional way of designingdistributed algorithms: assuming a threshold t on the numberof process failures and determining constraints on processreplication of the form All mathematics should have n k · t. Under this model, there areseveral important results in the literature, and we useobservations from these results to derive a framework that enablesmore expressive characterizations of failures, but still capturesthe essence of previous results. Our framework then has two parts:a characterization of the subsets of processes that can fail in anexecution and properties that express constraints on processreplication, called replication predicates. After presentingour model to characterize failures, we first consider a class ofreplication predicates that represent most well-known problems indistributed computing. Second, we extend this set to also includesome predicates for problems that have unusual bounds. We arguethat, although being unusual, they have important practicalimplications, such as using fewer replicas. Copyright © 2007John Wiley & Sons, Ltd.