Probabilistic resource allocation in heterogeneous distributed systems with random failures

  • Authors:
  • Vladimir Shestak;Edwin K. P. Chong;Anthony A. Maciejewski;Howard Jay Siegel

  • Affiliations:
  • Ricoh InfoPrint Solutions, 6300 Diagonal Highway, Boulder, CO 80301, United States;Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373, United States and Department of Mathematics, Colorado State University, Fort Collins, CO ...;Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373, United States;Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373, United States and Department of Computer Science, Colorado State University, Fort Collins ...

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of finding efficient workload distribution techniques is becoming increasingly important today for heterogeneous distributed systems where the availability of compute nodes may change spontaneously over time. Resource-allocation policies designed for such systems should maximize the performance and, at the same time, be robust against failure and recovery of compute nodes. Such a policy, based on the concepts of the Derman-Lieberman-Ross theorem, is proposed in this work, and is applied to a simulated model of a dedicated system composed of a set of heterogeneous image processing servers. Assuming that each image results in a ''reward'' if its processing is completed before a certain deadline, the goal for the resource allocation policy is to maximize the expected cumulative reward. An extensive analysis was done to study the performance of the proposed policy and compare it with the performance of some existing policies adapted to this environment. Our experiments conducted for various types of task-machine heterogeneity illustrate the potential of our method for solving resource allocation problems in a broad spectrum of distributed systems that experience high failure rates.