Safety scheduling strategies in distributed computing

  • Authors:
  • Victor V. Toporkov;Alexey Tselishchev

  • Affiliations:
  • Computer Science Department, Moscow Power Engineering Institute, ul. Krasnokazarmennaya 14, Moscow, 111250 Russia.;European Organization for Nuclear Research (CERN), 1211 Geneva, 23, Switzerland

  • Venue:
  • International Journal of Critical Computer-Based Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an approach to safety scheduling in distributed computing based on strategies of resource co-allocation for complex sets of tasks (jobs). The necessity of guaranteed job execution until the time limits requires taking into account the distributed environment dynamics, namely, changes in the number of jobs for servicing, volumes of computations, possible failures of processor nodes, etc. As a consequence, in the general case, a set of versions of scheduling and resource co-allocation, or a strategy, is required instead of a single version. Safety strategies are formed for structurally different job models with various levels of task granularity and data replication policies. We develop and consider scheduling strategies which combine fine-grain and coarse-grain computations, multiple data replicas and constrained data movement. These strategies are evaluated using simulations studies and addressing a variety of metrics.