Greedy "exploitation" is close to optimal on node-heterogeneous clusters

  • Authors:
  • Arnold L. Rosenberg

  • Affiliations:
  • Colorado State University, Fort Collins, CO and Northeastern University, Boston, MA

  • Venue:
  • Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Cluster-Exploitation Problem (CEP) challenges a master computer to schedule a "borrowed" node-heterogeneous cluster C of worker computers in a way that maximizes the amount of work that C's computers complete within a fixed time period. This challenge is heightened by the fact that "completing" work requires C's computers to return results from their work to the master. It has been known for some time that the greedy LIFO protocol, which orchestrates C's computers to finish working in the opposite of their starting order, does not solve the CEP optimally; in fact, the FIFO protocol, which has C's computers finish working in the same order as they start, does solve the CEP optimally (over sufficiently long time periods). That said, the LIFO protocol has features (aside from its intuitive appeal) that would make it attractive to implement when solving the CEP--as long as its solution to the problem was not too far from optimal. This paper shows this to be the case. Specifically: 1. The LIFO protocol provides approximately optimal solutions to the CEP, in the following sense. For every cluster C, there is a fixed fraction ϕC 0 that does not depend on how heterogeneous cluster C is (as measured by the relative speeds of its fastest and slowest computers) such that C completes at least the fraction ϕC as much work under the LIFO protocol as under the optimal FIFO protocol. Our analysis of the CEP uncovers an unexpected property of the LIFO protocol: 2. In common with the FIFO protocol, the LIFO protocol's work production is independent of the order in which the master supplies work to the workers-- no matter what the relative speeds of the workers are. Within the literature of divisible load scheduling, the CEP follows the masterworker paradigm under the "single-port with no overlap" model.