Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters

  • Authors:
  • Navendu Jain;Ishai Menache;Joseph Naor;Jonathan Yaniv

  • Affiliations:
  • Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Technion - Israel Institute of Technology, Haifa, Israel;Technion - Israel Institute of Technology, Haifa, Israel

  • Venue:
  • Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider a market-based resource allocation model for batch jobs in cloud computing clusters. In our model, we incorporate the importance of the due date of a job rather than the number of servers allocated to it at any given time. Each batch job is characterized by the work volume of total computing units (e.g., CPU hours) along with a bound on maximum degree of parallelism. Users specify, along with these job characteristics, their desired due date and a value for finishing the job by its deadline. Given this specification, the primary goal is to determine the scheduling} of cloud computing instances under capacity constraints in order to maximize the social welfare (i.e., sum of values gained by allocated users). Our main result is a new ( C/(C-k) ⋅ s/(s-1))-approximation algorithm for this objective, where C denotes cloud capacity, k is the maximal bound on parallelized execution (in practical settings, k l C) and s is the slackness on the job completion time i.e., the minimal ratio between a specified deadline and the earliest finish time of a job. Our algorithm is based on utilizing dual fitting arguments over a strengthened linear program to the problem. Based on the new approximation algorithm, we construct truthful allocation and pricing mechanisms, in which reporting the job true value and properties (deadline, work volume and the parallelism bound) is a dominant strategy for all users. To that end, we provide a general framework for transforming allocation algorithms into truthful mechanisms in domains of single-value and multi-properties. We then show that the basic mechanism can be extended under proper Bayesian assumptions to the objective of maximizing revenues, which is important for public clouds. We empirically evaluate the benefits of our approach through simulations on data-center job traces, and show that the revenues obtained under our mechanism are comparable with an ideal fixed-price mechanism, which sets an on-demand price using oracle knowledge of users' valuations. Finally, we discuss how our model can be extended to accommodate uncertainties in job work volumes, which is a practical challenge in cloud settings.