Efficient online scheduling for deadline-sensitive jobs: extended abstract

Authors:
Brendan Lucier;Ishai Menache;Joseph (Seffi) Naor;Jonathan Yaniv
Affiliations:
Microsoft Research, Cambridge, MA, USA;Microsoft Research, Redmond, WA, USA;Technion - Israel Institute of Technology, Haifa, Israel;Technion - Israel Institute of Technology, Haifa, Israel
Venue:
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2013

Citing 11
Cited 0

MOCA: a multiprocessor on-line competitive algorithm for real-time system scheduling

Theoretical Computer Science - Special issue on dependable parallel computing
Bounding the Power of Preemption in Randomized Scheduling

SIAM Journal on Computing
Bandwidth Allocation with Preemption

SIAM Journal on Computing
Online auctions with re-usable goods

Proceedings of the 6th ACM conference on Electronic commerce
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics

Proceedings of the 2nd ACM Symposium on Cloud Computing
A truthful mechanism for value-based scheduling in cloud computing

SAGT'11 Proceedings of the 4th international conference on Algorithmic game theory
Jockey: guaranteed job latency in data parallel clusters

Proceedings of the 7th ACM european conference on Computer Systems
Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Multi-resource fair queueing for packet processing

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Bridging the tenant-provider gap in cloud services

Proceedings of the Third ACM Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider mechanisms for online deadline-aware scheduling in large computing clusters. Batch jobs that run on such clusters often require guarantees on their completion time (i.e., deadlines). However, most existing scheduling systems implement fair-share resource allocation between users, an approach that ignores heterogeneity in job requirements and may cause deadlines to be missed. In our framework, jobs arrive dynamically and are characterized by their value and total resource demand (or estimation thereof), along with their reported deadlines. The scheduler's objective is to maximize the aggregate value of jobs completed by their deadlines. We circumvent known lower bounds for this problem by assuming that the input has slack, meaning that any job could be delayed and still finish by its deadline. Under the slackness assumption, we design a preemptive scheduler with a constant-factor worst-case performance guarantee. Along the way, we pay close attention to practical aspects, such as runtime efficiency, data locality and demand uncertainty. We evaluate the algorithm via simulations over real job traces taken from a large production cluster, and show that its actual performance is significantly better than other heuristics used in practice. We then extend our framework to handle provider commitments: the requirement that jobs admitted to service must be executed until completion. We prove that no algorithm can obtain worst-case guarantees when enforcing the commitment decision to the job arrival time. Nevertheless, we design efficient heuristics that commit on job admission, in the spirit of our basic algorithm. We show empirically that these heuristics perform just as well as (or better than) the original algorithm. Finally, we discuss how our scheduling framework can be used to design truthful scheduling mechanisms, motivated by applications to commercial public cloud offerings.