A framework for the design of dependent-failure algorithms: Research Articles

Authors:
Flavio Junqueira;Keith Marzullo
Affiliations:
Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093, U.S.A.;Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093, U.S.A.
Venue:
Concurrency and Computation: Practice & Experience - Parallel and Distributed Computing (EuroPar 2005)
Year:
2007

Citing 0
Cited 5

Optimizing Threshold Protocols in Adversarial Structures

DISC '08 Proceedings of the 22nd international symposium on Distributed Computing
Failure-aware resource management for high-availability computing clusters with distributed virtual machines

Journal of Parallel and Distributed Computing
Brief announcement: on L-resilience, hitting sets, and colorless tasks

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
The topology of shared-memory adversaries

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Relating L-resilience and wait-freedom via hitting sets

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking

Quantified Score

Hi-index	0.02

Visualization

Abstract

Dependent failures constitute a real problem in distributedsystems. In this paper, we present a framework for the design ofdistributed algorithms for systems in which failures of processesare not necessarily independent or identically distributed. Toderive this framework, we revisit the traditional way of designingdistributed algorithms: assuming a threshold t on the numberof process failures and determining constraints on processreplication of the form All mathematics should have n k · t. Under this model, there areseveral important results in the literature, and we useobservations from these results to derive a framework that enablesmore expressive characterizations of failures, but still capturesthe essence of previous results. Our framework then has two parts:a characterization of the subsets of processes that can fail in anexecution and properties that express constraints on processreplication, called replication predicates. After presentingour model to characterize failures, we first consider a class ofreplication predicates that represent most well-known problems indistributed computing. Second, we extend this set to also includesome predicates for problems that have unusual bounds. We arguethat, although being unusual, they have important practicalimplications, such as using fewer replicas. Copyright © 2007John Wiley & Sons, Ltd.