Static worksharing strategies for heterogeneous computers with unrecoverable failures

  • Authors:
  • Anne Benoit;Yves Robert;Arnold Rosenberg;Frédéric Vivien

  • Affiliations:
  • Ecole Normale Supérieure de Lyon, France and LIP, UMR, ENS, CNRS, INRIA, UCBL, Lyon, France;Ecole Normale Supérieure de Lyon, France and LIP, UMR, ENS, CNRS, INRIA, UCBL, Lyon, France;Colorado State University, Fort Collins;INRIA, France and LIP, UMR, ENS, CNRS, INRIA, UCBL, Lyon, France

  • Venue:
  • Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

One has a large workload that is "divisible" (its constituent work's granularity can be adjusted arbitrarily) and one has access to p remote computers that can assist in computing the workload. How can one best utilize the computers toward this end? Two features complicate this question. First, the remote computers may differ from one another in speed. Second, each remote computer is subject to interruptions of known likelihood that kill all work in progress on it. One wishes to orchestrate sharing the workload with the remote computers in a way that maximizes the expected amount of work completed, given the risk of interruptions. We consider three versions of the preceding problem. Two versions envision heterogeneous computing resources: the remote computers may differ from one another in speed; one version envisions homogeneous computing resources: the remote computers are identical. One of the heterogeneous versions ignores communication costs (i.e., assumes that they are negligible); the other two versions account explicitly for communication costs. We provide exact expressions for the optimal work expectation for all three versions of the problem. For the most general version (heterogeneous resources, with communication costs), we provide a recurrence for computing this expectation; for the other two versions, we provide closed-form expressions.