Static worksharing strategies for heterogeneous computers with unrecoverable interruptions

Authors:
Anne Benoit;Yves Robert;Arnold Rosenberg;Frédéric Vivien
Affiliations:
Ecole Normale Supérieure de Lyon, 46 allée d'Italie, 69364 Lyon Cedex 07, France;Ecole Normale Supérieure de Lyon, 46 allée d'Italie, 69364 Lyon Cedex 07, France;Colorado State University, Fort Collins, USA;Ecole Normale Supérieure de Lyon, 46 allée d'Italie, 69364 Lyon Cedex 07, France
Venue:
Parallel Computing
Year:
2011

Citing 11
Cited 1

Making commitments in the face of uncertainty: how to pick a winner almost every time (extended abstract)

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Scheduling with unexpected machine breakdowns

Discrete Applied Mathematics
Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload

IEEE Transactions on Parallel and Distributed Systems
MPI: The Complete Reference

MPI: The Complete Reference
Scheduling Divisible Loads in Parallel and Distributed Systems

Scheduling Divisible Loads in Parallel and Distributed Systems
On Optimal Strategies for Cycle-Stealing in Networks of Workstations

IEEE Transactions on Computers
Ten Reasons to Use Divisible Load Theory

Computer
Efficient collective communication in distributed heterogeneous systems

Journal of Parallel and Distributed Computing
Divisible Load Theory: A New Paradigm for Load Scheduling in Distributed Systems

Cluster Computing
Scheduling Divisible Loads on Star and Tree Networks: Results and Open Problems

IEEE Transactions on Parallel and Distributed Systems
Static strategies forworksharing with unrecoverable interruptions

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Optimizing performance and reliability on heterogeneous parallel systems: Approximation algorithms and heuristics

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One has a large computational workload that is ''divisible'' (its constituent tasks' granularity can be adjusted arbitrarily) and one has access to p remote computers that can assist in computing the workload. How can one best utilize the computers? Two features complicate this question. First, the remote computers may differ from one another in speed. Second, each remote computer is subject to interruptions of known likelihood that kill all work in progress on it. One wishes to orchestrate sharing the workload with the remote computers in a way that maximizes the expected amount of work completed. We deal with three versions of this problem. The simplest version ignores communication costs but allows computers to differ in speed (a heterogeneous set of computers). The other two versions account for communication costs, first with identical remote computers (a homogeneous set of computers), and then with computers that may differ in speed. We provide exact expressions for the optimal work expectation for all three versions of the problem - via explicit closed-form expressions for the first two versions, and via a recurrence that computes this optimal value for the last, most general version.