Analysis and simulation of a fair queueing algorithm
SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
How Useful Is Old Information?
IEEE Transactions on Parallel and Distributed Systems
The Power of Two Choices in Randomized Load Balancing
IEEE Transactions on Parallel and Distributed Systems
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Hadoop: The Definitive Guide
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
A generalization of multiple choice balls-into-bins
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Modeling and synthesizing task placement constraints in Google compute clusters
Proceedings of the 2nd ACM Symposium on Cloud Computing
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Why let resources idle? aggressive cloning of jobs with dolly
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Performance isolation and fairness for multi-tenant cloud storage
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Communications of the ACM
Adaptive load sharing in homogeneous distributed systems
IEEE Transactions on Software Engineering
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Omega: flexible, scalable schedulers for large compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
The case for tiny tasks in compute clusters
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Hi-index | 0.00 |
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.