MOCA: a multiprocessor on-line competitive algorithm for real-time system scheduling
Theoretical Computer Science - Special issue on dependable parallel computing
Bounding the Power of Preemption in Randomized Scheduling
SIAM Journal on Computing
Bandwidth Allocation with Preemption
SIAM Journal on Computing
Online auctions with re-usable goods
Proceedings of the 6th ACM conference on Electronic commerce
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics
Proceedings of the 2nd ACM Symposium on Cloud Computing
A truthful mechanism for value-based scheduling in cloud computing
SAGT'11 Proceedings of the 4th international conference on Algorithmic game theory
Jockey: guaranteed job latency in data parallel clusters
Proceedings of the 7th ACM european conference on Computer Systems
Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Multi-resource fair queueing for packet processing
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Bridging the tenant-provider gap in cloud services
Proceedings of the Third ACM Symposium on Cloud Computing
Hi-index | 0.00 |
We consider mechanisms for online deadline-aware scheduling in large computing clusters. Batch jobs that run on such clusters often require guarantees on their completion time (i.e., deadlines). However, most existing scheduling systems implement fair-share resource allocation between users, an approach that ignores heterogeneity in job requirements and may cause deadlines to be missed. In our framework, jobs arrive dynamically and are characterized by their value and total resource demand (or estimation thereof), along with their reported deadlines. The scheduler's objective is to maximize the aggregate value of jobs completed by their deadlines. We circumvent known lower bounds for this problem by assuming that the input has slack, meaning that any job could be delayed and still finish by its deadline. Under the slackness assumption, we design a preemptive scheduler with a constant-factor worst-case performance guarantee. Along the way, we pay close attention to practical aspects, such as runtime efficiency, data locality and demand uncertainty. We evaluate the algorithm via simulations over real job traces taken from a large production cluster, and show that its actual performance is significantly better than other heuristics used in practice. We then extend our framework to handle provider commitments: the requirement that jobs admitted to service must be executed until completion. We prove that no algorithm can obtain worst-case guarantees when enforcing the commitment decision to the job arrival time. Nevertheless, we design efficient heuristics that commit on job admission, in the spirit of our basic algorithm. We show empirically that these heuristics perform just as well as (or better than) the original algorithm. Finally, we discuss how our scheduling framework can be used to design truthful scheduling mechanisms, motivated by applications to commercial public cloud offerings.