Rescheduling for reliable job completion with the support of clouds

  • Authors:
  • Young Choon Lee;Albert Y. Zomaya

  • Affiliations:
  • Centre for Distributed and High Performance Computing, School of Information Technologies, The University of Sydney, NSW 2006, Australia;Centre for Distributed and High Performance Computing, School of Information Technologies, The University of Sydney, NSW 2006, Australia

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major performance issue in large-scale decentralized distributed systems, such as grids, is how to ensure that jobs finish their execution within the estimated completion times in the presence of resource performance fluctuations. Previously, several techniques including advance reservation, rescheduling and migration have been adopted to resolve/relieve this issue; however, they have some non-negligent practicality hurdles. The use of clouds may be an attractive alternative, since resources in clouds are much more reliable than those in grids. This paper investigates the effectiveness of rescheduling using cloud resources to increase the reliability of job completion. Specifically, schedules are initially generated using grid resources, and cloud resources (relatively costlier) are used only for rescheduling to cope with a delay in job completion. A job in our study refers to a bag-of-tasks (BoT) application that consists of a large number of independent tasks; this job model is common in many science and engineering applications. We have devised a novel rescheduling technique, called rescheduling using clouds for reliable completion (RC^2) and applied it to three well-known existing heuristics. Our experimental results reveal that RC^2 significantly reduces delay in job completion.