Distributed oblivious load balancing using prioritized job replication

Authors:
Amir Nahir;Ariel Orda;Danny Raz
Affiliations:
Technion, Israel Institute of Technology, Haifa, Israel;Technion, Israel Institute of Technology, Haifa, Israel;Technion, Israel Institute of Technology, Haifa, Israel
Venue:
Proceedings of the 8th International Conference on Network and Service Management
Year:
2012

Citing 17
Cited 0

The limited performance benefits of migrating active processes for load sharing

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Exploiting process lifetime distributions for dynamic load balancing

ACM Transactions on Computer Systems (TOCS)
Self-similarity in World Wide Web traffic: evidence and possible causes

IEEE/ACM Transactions on Networking (TON)
Load-balancing heuristics and process behavior

SIGMETRICS '86/PERFORMANCE '86 Proceedings of the 1986 ACM SIGMETRICS joint international conference on Computer performance modelling, measurement and evaluation
How Useful Is Old Information?

IEEE Transactions on Parallel and Distributed Systems
The Power of Two Choices in Randomized Load Balancing

IEEE Transactions on Parallel and Distributed Systems
Task assignment with unknown duration

Journal of the ACM (JACM)
The MOSIX Distributed Operating System: Load Balancing for UNIX

The MOSIX Distributed Operating System: Load Balancing for UNIX
DNS dispatching algorithms with state estimators for scalable Web-server clusters

World Wide Web
Analysis of cycle stealing with switching cost

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
On Choosing a Task Assignment Policy for a Distributed Server System

On Choosing a Task Assignment Policy for a Distributed Server System
A dynamic load distribution strategy for systems under high task variation and heavy traffic

Proceedings of the 2003 ACM symposium on Applied computing
Theory, Volume 1, Queueing Systems

Theory, Volume 1, Queueing Systems
Autopilot: automatic data center management

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
Cloud computing

Communications of the ACM - Web science
On cost-aware monitoring for self-adaptive load sharing

IEEE Journal on Selected Areas in Communications
Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services

Performance Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Load balancing in large distributed server systems is a complex optimization problem of critical importance in cloud systems and data centers. However, any full (i.e., optimal) solution incurs significant, often prohibitive, overhead due to the need to collect state-dependent information. We propose a novel scheme that incurs no communication overhead between the users and the servers upon job arrivals, thus removing any scheduling overhead from the job execution's critical path. Furthermore, our scheme is oblivious, that is, it does not use any state information. Our approach is based on creating, in addition to the regular job requests that are assigned to randomly chosen servers, also replicas that are sent to different servers; these replicas are served in low priority, such that they do not add any real burden on the servers. Through analysis and simulations we show that the expected system performance improves up to a factor of 2 (even under high load conditions), if job lengths are exponentially distributed, and over a factor of 5, when job lengths adhere to heavy-tailed distributions. We implemented a load balancing system based on our approach and deployed it on the Amazon Elastic Compute Cloud (EC2). Realistic load tests on that system indicate that the actual performance is as predicted.