Exploiting process lifetime distributions for dynamic load balancing
ACM Transactions on Computer Systems (TOCS)
On choosing a task assignment policy for a distributed server system
Journal of Parallel and Distributed Computing - Special issue on software support for distributed computing
Task assignment with unknown duration
Journal of the ACM (JACM)
Radio Interface System Planning for GSM/Gprs/Umts
Radio Interface System Planning for GSM/Gprs/Umts
Heavy Tails in Multi-Server Queue
Queueing Systems: Theory and Applications
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Towards characterizing cloud backend workloads: insights from Google compute clusters
ACM SIGMETRICS Performance Evaluation Review
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
Modern production clusters are often shared by multiple types of jobs with different priorities in order to improve resource utilization. Preemption is a common technique employed by MapReduce schedulers to avoid delaying production jobs while allowing the cluster to be shared by other non-production jobs. In addition, it also prevents a large job from occupying too many resources and starving others. Recent literature shows that jobs in production MapReduce clusters have a mixture of lengths and sizes spanning many orders of magnitude. In this type of environments, the current preemption policy used by MapReduce schedulers can significantly delay the completion time of long running tasks, resulting in waste of resources. This paper firstly discusses the heterogeneous nature of MapReduce jobs and their arrival rates in several production clusters. Secondly, we characterize the situations where the current preemption policy causes significant preemption penalty. We then propose a simple mechanism that works in conjunction with existing job schedulers to address this problem. Finally, we evaluate our solution under various types of workloads in Amazon EC2. Experiments show our method can improve system normalized performance by 15% during busy periods by effectively avoiding unnecessary preemption while preserving fairness.